A real story about transforming an app with Kubernetes cluster

Every project has two sides – positive, like benefits as outcomes, and negative, unforeseen challenges discovered while the project was being developed. Today, in this blog, I’ll be talking about that not-so-bright side of the project, the one that brings up specific challenges and makes the team a little bit nervous.

Just like most of the software developers in Devoteam, my main obligations for working on several projects were to write good code that fulfilled user requirements, followed a set of engineering principles, and deploy it to a development and/or user acceptance testing environment.

Though, on my last project, I was part of the team that deployed a project to a production environment where it will be used by actual clients and customers.

This particular project was centered around the idea of transforming an existing monolith application to one with microservices architecture comprising of multiple smaller services running independently on the Kubernetes cluster.

The architects, business analysts, and team leads, who were quite experienced, took the necessary steps to ensure a successful transition to production. We had a well-thought-out plan and executed it superbly. The deployment was successful. However, upon deploying the project, there were several issues we did not predict. The two key takeaways for me from this experience are:

importance of stress testing
data migration and date field discrepancies

Importance of stress testing

One of the biggest issues in the production environment was a major overall performance decrease in terms of speed. This decrease was impacted by several factors:

excessive backend API calls and unnecessary imports on the frontend,
bottlenecks and suboptimal framework utilization on the backend,
slow fetching of information because of a significant increase in the data set.

I will analyze each of these and our solutions in the following section.

Frontend optimization

Frontend caused the least amount of performance decrease. One small issue was unused imports, and that was solved fairly quickly by simply deleting the said imports.

Next, there were some services that were provided ‘inRoot’ of the application. As a result, the application required all of the services to be instantiated before the app was running. This issue was resolved by deleting the provided ‘inRoot’ configuration where we deemed it to be obsolete, and by properly importing them in modules which allowed for better ‘lazy loading’.

Finally, on our application, the user has an option to either view or edit the document. However not all functionality is enabled in ‘view’ mode, and thus some API calls to the backend were redundant. By simply checking which mode was activated in a component instantiation, we were able to reduce backend API calls, and as a result, speed up the application.

Backend optimization

Optimizing backend services required more effort than the frontend simply because backend code spanned across multiple microservices. During this time, we have inspected every service in which we suspected a bottleneck occurs, and each of the services presented a unique problem. In this section, we’ll take a look at some of them.

One of the things which were reported shortly after production deployment is that the application seems to perform better in certain parts of the day, while in others it slows down. We have expanded logging information which provided insights into what caused the decrease in performance and that led us to some interesting discoveries.

We are using Keycloak, an open-source user management tool, to manage users, roles, and permissions for the application. Inside of its own structure lies a MySQL database to store this sensitive information. What we found is our code queries the database to fetch data from it which doesn’t change once everything is set up. By implementing a simple caching solution in the service layer like a hashmap in which we store the roles, we are able to bypass the query if the desired role is inside of the map. And if the role is updated in the database, we update it in the cache as well.

One other interesting finding was we didn’t leverage the capabilities of MongoDB and its aggregation pipelines enough. This was evident once data migration was complete, and the size of data increased from a couple of thousand documents to tens of thousands. MongoDB was our choice for storing pretty much all of the application-generated data. By transforming code in the repository layer of a service from one that depends on typescript logic to one which uses MongoDB’s aggregation pipeline, we’ve lightened the load of the execution from the service to the database. As a result of doing that, the overall performance improved.

This was all well for operations regarding updating the data, but we also needed to speed up fetching information from the database, and there was no way for us to do that by improving the code. However, there was something we could do on the database itself, and that’s to add indexes. This allows MongoDB to limit the number of documents it needs to inspect, and thus perform queries faster. We’ve analyzed each collection in every database to identify the most optimal indexes for every one of them, and once that was implemented, the application was running fast once again.

Data migration and date field discrepancies

One of the main requirements was the usability of the data from the previous version. This was made possible by our data analytics and migration team, who have mapped the existing fields to a new set of fields, and migrated the data in batches with intermittent testing to make sure that the application would run smoothly.

Everything would have gone perfectly well, but for the fact that we haven’t thought of one possible issue that would eventually arise from this and that is the migration and storing of date fields.

Most of the data documents contain various date fields: start and end dates of certain periods, date of birth of both clients and employees, dates of certain activities, etc.

The dates themselves were easy to migrate, date in one javascript application is interpreted the same way in another. But the issue was the dates inside of newly created documents on the new version of the application are stored in a format where the time of the date is stored as midnight. Not only that, because our database is located in a Central European Timezone, the date was stored as the day before the specified date with the time set as 23:00.

For example, if we were to create a new client with a date of birth set as 01.03.2007, the stored date would look like this: “28-02-2007T23:00:00Z”.

In some other cases, the application stores date-time objects, such as date and time of the scheduled appointments, dates of creation and modification of documents, etc. This time the stored date would include hours and minutes, but again it would store it in the Central European Standard time zone, meaning it would deduct one hour from the date-time object (e.g. an appointment scheduled for 20.01.2022 at 15:30 would be stored as “20-01-2022T14:30:00Z”.

The discrepancy between the date formats in the old and new applications caused some functionalities to be broken. This was a very significant problem, and no solution would mitigate every aspect of it. Certain job schedulers performing various updates on the Kubernetes cluster require a strict date and time interpretation in the CEST time zone, while other parts of the application are agnostic to the time zone, and transform the displayed date to the time zone of the users’ device.

One possible solution is to modify the date format so that it’s stored in UTC format so that it’s time zone agnostic. That would imply refactoring parts of code on the backend where the time zone is required, but this is achievable by utilizing a framework like Moment to transform the date in the correct format before proceeding with code execution.

Conclusion

Finally, I’d like to add one more thing which I think is a key factor in a project’s success and that is a collaborative team in which members have diversified skillsets. So no matter what the challenges of a project are, a harmonious team will certainly overcome them. Throughout this period we bounced off ideas, tried new things, and in the end, we succeeded.

This experience showed just how much goes into planning and executing not just the deployment to production, but the whole project as a whole. And this is also what I love about my job, there is always something more to learn.