Back To The Top

Legacy data warehouse to data lake in the cloud: Open Heart Surgery

Most organizations are embarking on a digital transformation journey, in which they are moving their data processes from legacy systems to the cloud. The purpose of this journey is to have better-structured data that can be used to gain valuable insights focused on improving all aspects of the business. However, this involves a lot of work, including, but not limited to:

  • Building data lake on a cloud platform of choice
  • Building data access models
  • Data governance
  • Building analytical workloads
  • Scheduling batch processes

This all sounds exciting and, undoubtedly, it is a great step towards having a robust data platform ready for the future. However, before we solve the problems of the future let’s look at the challenges of the present!

You might ask, why do we need to worry about the present? It is business as usual, isn’t it? When the “data engineers” are doing their job building that shiny new platform of the future, why do we need to be concerned about the present?

In this article we discuss why it is a burning issue and building a new platform is nothing less than open-heart surgery.
There are multiple challenges that organizations encounter while moving to the cloud.

Keeping the Patient Alive (Business as usual)

The teams that are dependent on data for their daily job will need that data to be available to them on time without fail. Typically, this means maintenance, lots of maintenance, assuming that the system is running on legacy systems.

For example, in a project at a client of ours, the legacy data warehouse is running on a single machine and 12 teams are dependent on the data to make daily decisions that affect the running of the business. The system is not designed to handle the load that it is subjected to and is prone to multiple issues, such as, deadlocks and performance bottlenecks. This means that while the data engineering team is busy building a new data platform, it also needs to support legacy systems to help those 12 teams do their job without negatively impacting the business.

This is something that just needs some patience and perseverance from everyone - from the ones who are building the new platform and from the ones who are supporting the legacy system (and often, those two are the same team).

Keeping the Blood Flowing (DataPipelines)

Huge volumes of data, a large number of interdependencies of data processes, and various other reasons make it almost impossible to move everything to the cloud at one go. Hence, this complex surgery needs to be broken down into manageable data streams and then moved one after another.

There are multiple ways of breaking it down, such as divide by datasets, or by teams, or by process dependencies. There is no right or wrong answer on how to break it down into logical chunks. It varies with every organization on how the legacy system is structured and what strategy would make it easier and quicker to build and deliver the new system.

Giving the Patient a New Lease of Life (Data Catalog)

While the new platform is being built at full speed, piece by piece, there will be a transitional period during which various teams need to access data both from the legacy system as well as from the cloud. For example, a team might need to access a dataset that is still waiting to be moved to the cloud and to access another dataset that is already on the cloud. The solution is to have a transitional data catalog that encompasses across the 2 systems. Further, 2 access mechanisms need to be operational during the transitional period.

With the numerous technological offerings available across multiple cloud vendors, now it is possible to build a data platform to meet any organization’s need, that offers high availability, scalability, and predictable performance. I will discuss these concepts in detail in another article. However, their importance cannot be overstated.

It might sound a little difficult to start building a cloud-native platform, however, it is extremely important that organizations plan to do that, if not already decided to do so. Organizations just need to be mindful of the challenges discussed in this article.

At IOVIO, our endeavor is to help organizations meet their data needs. We have strong expertise in building and maintaining cloud-native platforms that deliver high performance at any scale. We have experience in engineering data platforms for many marquee clients including ING, National Netherlande Investment Partners, Nike, and many more. We ensure business continuity for our clients while building data solutions customized to their needs. Our emphasis is on delivering value to our clients while focusing on our core values – disrupt, optimize, innovate.

Ready to get started?
Contact Us
Close form
Contact Us
Close form
Challenging projects need tailored approaches
Tell us about your project. Share your challenges and concerns, and we’ll schedule a call to provide a tailored solution.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.