Since a long time ago, my main writing platform is Medium. But this blog was started before I joined Medium, so I’ve been keeping it up and reposting stories from here to there. However there’s no straitforward way to do that automatically, and manually it’s a bit of a pain because all the embeds, graphics etc usually can’t just be copy-pasted; they need to be re-inserted manually in the right places.
Therefore, I am changing the way of posting. From now on, I won’t be publishing full stories here, but only links to Medium. So, here goes:
First of all, a disclaimer: if you’re writing a microservice (which everyone does now right?) and want it to be idiomatic, you don’t normally use several different data sources in it.
Why? Well, by definition, microservices should be loosely coupled, so that they can be independent. Having several microservices writing into the same database really breaks this principle, because it means that your data can be changed by several independent actors and possibly in different ways, which makes it really difficult to speak about data consistency and also, you can hardly say that the services are independent since they have at least one common thing they both depend on: the shared (and possibly screwed) data. So, there’s a design pattern called Database Per Service which is intended to solve this problem by enforcing one service per database. And this means that every microservice serves as an intermediary between the clients and its data source, and the data can only be changed through the interface that this service provides.
However, is one service per database equal to one database per service? Nope, it isn’t. If you think about it, it’s not really the same thing.
Which means that if we have several databases that are only accessed by one microservice, and any external access to these databases is implemented through the interface of this service, this service can still be considered idiomatic. It is still one service per database, though not one database per service.
Also, perhaps you don’t care about your microservices being idiomatic at all. That’s an option too. (That will be on your conscience though.)
So, when would we have several databases that we want to access from the same service? I can think of different options:
The data is too big to be in one database;
You are using databases as namespaces to just separate different pieces of data that belong to different domains or functional areas;
You need different access to the databases — perhaps one is mission-critical so you put it behind all kinds of security layers and the other isn’t that important and doesn’t need that kind of protection;
The databases are in different regions because they are written to by people in different places but need to be read from a central location (or vice versa);
And anything else, really, that just brought this situation about and you just need to live with it.
If your application is a Spring Boot application and you use Mongo as a database, the easiest way to go is just to use Spring Data Repositories. You just set up a dependency for mongo starter data (we’ll use Gradle project here as an example).
Actually, we are generating this example project with Spring Initializer, because it’s the easiest way to start a new Spring-based example. We have just selected Kotlin and Gradle in the generator settings and added Spring Web Starter and Spring Data MongoDB as dependencies. Let’s call the project multimongo.
When we created a project and downloaded the sources, we can see that the Spring created an application.properties file by default. I prefer yaml, so we’ll just rename it to application.yml and be done with it.
So. How do we set up access to our default mongo database using Spring Data? Nothing easier. This is what goes into the application.yml.
Now, let’s imagine a very simple and stupid case for our data split. Say we have a core database that’s storing the products for our web store. Then we have data about the price of the products; this data doesn’t need any access restriction as any user on the web can see the price, so we’ll call it external. However, we also have a price history, which we use for analytical purposes. This is limited access information, so we say, OK, it goes into a separate database which we’ll protect and call internal.
Obviously, for my case all of these are still on localhost and not protected, but bear with me, it is just an example.
We will also create three different directories to keep our data access related code in: data.core, data.external, and data.internal.
Our Product.kt keeps the entity and repository for the product, the ProductPrice.kt and ProductPriceHistory.kt are representing current prices for the products and historical prices. The entities and repos are pretty basic.
Now, let’s create a configuration for our default mongo.
We are using a MongoAutoConfiguration class here to create a default mongo client instance. However, we still need a MongoTemplate bean which we define explicitly.
As you can see, the core configuration only scans the core directory. This actually is the key to everything: we need to put our repositories in different directories, and those repositories will be scanned by different mongo templates. So, let’s create those additional mongo templates. We’re going to use a base class that will keep some shared functionality we’ll reuse to create the mongo clients.
And then, finally we create the two configurations to hold the mongo template instances for our external and internal databases.
So, we now have three mongo template beans that are created by mongoTemplate(), externalMongoTemplate(), and internalMongoTemplate() in three different configurations. These configurations scan different directories and use these different mongo template beans via the direct reference in @EnableMongoRepositories annotation — which means, they use the beans they create. Spring doesn’t have a problem with it; the dependencies will be resolved in a correct order.
So, how are we to check that everything is working? There’s one more step to be done: we need to initialize some data and then get it from the database.
Since it’s just an example, we’ll create some very basic data right when the application starts up, just to see that it’s there. We’ll use an ApplicationListener for that.
How do we check then that the data has been saved to the database? Since it’s a web application, we’ll expose the data in the REST controller.
The REST controller is just using our repos to call the findAll() method. We aren’t doing anything with the data transformations, we aren’t paging or sorting, we just want to see that something is there. Finally, it’s possible to start the application and see what happens.
Yay, there’s two products we created! We can see that Mongo assigned autogenerated IDs to them on save — we have only defined the names and dummy SKU codes.
However, how do we make sure that the data has really been saved to (and read from) different databases? For that, we can just use any mongo client application that allows us to connect to the local mongo instance (I am using the official tool from mongo — MongoDB Compass).
Let’s check the content in the database that’s holding our current prices.
We can also use an integration test to check the data instead of doing it manually if we want to do everything right (actually not everything — we’d need to use the embedded mongo database for the tests, but we’ll skip this part here to not make the tutorial too complicated). We’ll utilize the MockMvc from spring-test library for this purpose.
You can find the full working example here in my github repo. Hope this helped you solve the issue of using several mongo instances in one Spring Boot web application! It’s not such a difficult problem, but also not quite trivial.
When I was looking at the other examples on the web, I also read this article (called Spring Data Configuration: Multiple Mongo Databases by Azadi Bogolubov) and it was pretty good and comprehensive. However, it didn’t quite fit my case because it was overriding the automatic mongo configuration completely. I, on the other hand, wanted to still keep it for my default database, but not for the others. But the approach in that article is based on the same principle of using different mongo templates for scanning different repositories.
It’s just that, with the default configuration, you can easily get rid of extra classes once something changes for example and all your data goes to the same database again.
Then you could easily cleanup the non-default configurations but still keep the default one and only change the scope that it’s scanning. The application would still continue to work without a hitch. But both ways are completely working and valid.
Well to be exact, this is a recurrent event that happens once or twice per year. The event is called a Hack & Learn Week, and from the very name, you can draw a conclusion that if you are not in the mood for hacking, you can learn. The conclusion would be correct. All week long, workshops and talks are also taking place in the office, given by the employees willing to share their knowledge. Almost everyone pitches in: either you’re hacking, learning, giving a talk, being one of the organisers, acting as a technical consultant ready to jump in and help the teams in some particular knowledge area, or an infra person helping the speakers and the teams with their setup. There’s also a yoga session and some massages thrown into the mix, so as you can imagine, the week is pretty colourful. There’s of course an option to just continue working, and some teams do just that, because of deadlines etc. However, you would be missing out not to use this opportunity to do something different.
A. has strong opinions on stuff we’re doing wrong at OLX (on the SRE side). Like, reinventing our own wheels for stuff and adding manual hacks that those wheels require. Also, stuff that complicates life along with simplifying it. Like having our own DSL which of course isn’t parsable by IDEs, therefore making code unreadable — it is really difficult to see what comes from where and the IDE goes crazy and underlines everything in red.Are custom tools square wheels, or are they an attempt to fix the more “standard” square wheels? — Image taken from Giphy — https://giphy.com/gifs/square-wheels-UP5CZUXC5dH1K
However, these custom tools also exist for a reason. Standard tools like Helm makes one type complicated command lines to do things, there’s a lot of repetition, and the commands syntax usually isn’t like any programming language. Therefore, a lot depends on SREs, who are humans and therefore make mistakes — like typos — which are really hard to catch while debugging. The custom tools attempt to express infra in terms of code, with objects and types that allow to use the compiler to catch errors faster and easier.
This is a story about a rather unusual experiment, which our company ran with me as a (willing) guinea pig, to try and retrain a software developer as an SRE. SREs (or DevOps, and there’s a controversy on whether it’s the same job or not) are a hot item right now, I think maybe even more so than data scientists (well I don’t have stats in hand to confirm it, that’s rather a one-sided view). Anyway, our company was desperately searching for SREs, and then the bright idea came to one head.
We have all those devs, and they are all technical people too, right? And they work with infrastructure too, only a bit on the other side, but at least they have some idea about it, right? And maybe retraining a senior developer would actually be easier and less costly than training a junior SRE?
Everything “as code” is all the rage now. What can we represent as code except for the programs? First of all, infrastructure as code is gaining popularity — it is enough to see the Google Trends graph for it to see that it is steadily climbing year by year. Terraform, OpenShift, CloudFormation, Helm, Puppet and many other tools are the representatives of this trend.
However, this article deals with something else entirely: diagrams as code. Why do it? Well, code has a few advantages over, well, diagrams:
It is readable. Well, at least good code is. A lot of people absorb written information better than anything else, despite that saying about one picture being better than a thousand words.
It is compact. A text file size is usually times and times smaller than any picture. And is much easier therefore to store in the repository.
Version control. You can keep pictures under version control, however, they are binary files, and the changes are therefore obfuscated. If you change the picture in a repo, people will not know what the change was about, until they check out the repo and have a look at the picture. The diff itself won’t be much help at all.
It is easy. It is much easier to type “Service A uses Service B” than draw those boxes on a diagram, label them, connect them with arrows etc. Especially for people who might be, let’s say, artistically challenged.
It turns out, however, that there’s a tool that allow you to have a best of both worlds. And this tool is PlantUML.
PlantUML allows to basically write text which is automatically transformed into the diagrams. It has its own pretty simple DSL and allows for a lot of the types of UML diagrams:
Also, it supports some non-UML diagrams which are pretty cool, for example the Wireframe diagrams for UI design, which seems a really interesting concept.
How to use PlantUML? Actually, in a hundred ways. It can be installed locally as a separate tool or as a plugin to basically anything (Wikis, forums, text editors, IDEs and what not, check the link and chances are, you will find at least several alternatives that you’re already using). As my tool of choice is IntelliJ IDEA, this is the plugin I use.
Let’s try a sequence diagram, because it’s the one that usually gives me a lot of headache. (All those swimlanes and blocks that need to be aligned, don’t make me started.) We’re designing an automated restaurant order system (no waiter, just a tablet to order with — know what I mean?) and need a bird’s view of the basic flow. We have a client who orders from the menu, an inventory against which the order is checked, and a feedback system to be able to correct the order. And we’ll put some queues in to make the process asynchronous (just because we are cool).
How will it look? Approximately like this.
We can clearly see that we have one actor — Client, four participants MenuService, InventoryService and two queues for requests and responses — and a database to keep track of all this. The IDE plugin instantly transforms the code into this picture:
What can I do with it? I can export it into a picture and show to anyone. Also, I can use the online demo server and just copy and paste the whole code I have into the textbox there and click Submit. The demo server will return a URL to the generated diagram:
This URL can be used to get the picture into your project readme file, confluence wiki or just any web page. The interesting thing about it is that a picture itself isn’t stored on the demo server, because all the information is already encoded into the URL. So, just the URL is stored.
I think this tool is great to play with and explore. And these “diagrams” are great to store under source control, because all the changes are immediately readable by just scrolling to a diff. And it goes so much faster than drawing and repositioning all those blocks and swimlanes.
If you like the idea, by any means go and try the tool on your own! What I’ve shown here is just a very basic example, but I thing one can do a lot with it. The website also has a FAQ to help people with some issues that may arise (I experienced none with the IDE plugin, but as this tool has so many integrations which I haven’t tried).
Not all of us are artist, but the great thing is, not all of us have to be.
Remote and distributed teams: fringe trend or the future of IT?
I have some experience with remote work, which I’ve shared in an article called Out of sight, out of mind, or How to be productive when working remotely. The topic still interests me, however, in more ways than just to understand how to make it work. The IT industry, while not adopting remote work approach on a global scale, does have big and successful companies that swear by it and want nothing else.