Business

Metadata virtualization and orchestration seen as critical new technology to improve enterprise data integration

Metadata-driven data virtualization and improved orchestration can help provide the inclusion and scale to accomplish far better data management. Such access then leads to improved integration of all information into an approachable resource for actionable business activities.

Written by Dana Gardner, Contributor Dec. 16, 2011 at 3:08 a.m. PT

Listen to the podcast. Find it on iTunes/iPod. Read a full transcript or download a copy. Sponsor:Stone Bond Technologies.

The latest BriefingsDirect discussion targets the need to make sense of the deluge and complexity of the data and information that is swirling in and around modern enterprises. Most large organizations today are able to identify, classify, and exploit only a small portion of the total data and information within their systems and processes.

Perhaps half of those enterprises actually have a strategy for improving on this dismal fact. But business leaders are now recognizing that managing and exploiting information is a core business competency that will increasingly determine their overall success. That means broader solutions to data distress are being called for.

This discussion then examines how metadata-driven data virtualization and improved orchestration can help provide the inclusion and scale to accomplish far better data management. Such access then leads to improved integration of all information into an approachable resource for actionable business activities.

With us now to help better understand these issues -- and the market for solutions to these problems -- areNoel Yuhanna, Principal Analyst at Forrester Research, and Todd Brinegar, Senior Vice President for Sales and Marketing at Stone Bond Technologies. The panel is moderated by Dana Gardner, Principal Analyst at Interarbor Solutions. [Disclosure: Stone Bond is a sponsor of BriefingsDirect podcasts.]

Here are some excerpts:

Gardner: It’s hard to overstate that the size and rate of growth of data and information is just overwhelming the business world. Why is it a critical stage now to change how we're addressing these issues?

Yuhanna: We have customers who have 55,000 databases, and they plan to double this in the next three to four years. Imagine trying to manage 55,000 databases. It’s a nightmare. In fact, they don’t even know what the count is actually.

The data has been growing significantly over the last few years because of different application deployments, different devices, such as mobile devices, and different environments, such as globalization. These are obviously creating a bigger need for integration.

Then, they're dealing with unstructured data, which is more than 75 percent of the data. It’s a huge challenge trying to manage this unstructured data. Forget about the intrusions and the hackers trying to break in. You can’t even manage that data.

Then, obviously, we have challenges of heterogeneous data sources, structured, unstructured, semi-structured. Then, we have different database types, and then, data is obviously duplicated quite a lot as well. These are definitely bigger challenges than we've ever seen.

Different data sources

Gardner: We're not just dealing with an increase in data, but we have all these different data sources. We're still dealing with mainframes.

It seems to me that you can’t just deal with big data. You have to deal with the right data. What’s the difference between big data and right data?

Yuhanna: It’s like GIGO, Garbage In, Garbage Out. A lot of times, organizations that deal with data don’t know what data they're dealing with. They don’t know that it’s valuable data in the organization. The big challenge is how to deal with this data.

The other thing is making business sense of this data. That's a very important point. And right data is important. I know a lot of organizations think, "Well, we have big data, but then we want to just aggregate the data and generate reports." But are these reports valuable? Fifty percent of times they're not, and they've just burned away 1,000 CPU cycles for this big data.

That's where there's a huge opportunity for organizations that are dealing with such big data. First of all, you need to understand what this big data means, and ask are you going to be utilizing it. Throwing something into the big data framework is useless and pointless, unless you know the data.
Throwing something into the big data framework is useless and pointless, unless you know the data.

Brinegar: Noel is 100 percent correct, and it is all about the right data, not just a lot of data. It’s interesting. We have clients that have a multiplicity of databases. Some they don’t even know about or no longer use, but there is relevant data in there.

When you were talking about the ability to attach to mainframes, all legacy systems, as well as incorporated into today’s environments, that's really a big challenge for a lot of integration solutions and a lot of companies.

So the ability to come in, attach, and get the right data and make that data actionable and make it matter to a company is really key and critical today. And being able to do that with the lowest cost of ownership in the market and the highest time to value equation -- so that the companies aren’t creating a huge amount of tech on top of the tech that they already have to get at this right data -- that’s really the key critical part.

Gardner: What’s with this notion about orchestrating, metadata, and data virtualization? Why are some of these architectural approaches being sought out, especially for real-time uses?

Holistic data set

Yuhanna: You have to look at the holistic data set. Today, most organizations or business users want to look at the complete data sets in terms of how to make business decisions. Typically, what they're seeing is that data has always been in silos, in different repositories, and different data segregations. They did try to bring this all together like in a warehouse trying to deliver this value.

But then the volumes of data, the real-time data needs are definitely a big challenge. Warehouses weren't meant to be real-time. They were able to handle data, but not in real time.

So this whole data segregation delivers a yet even better superior framework to deliver real-time data and the right data to consumers, to processes, to applications, whether it’s structured data, semi-structured, unstructured data, all coming together from different sources -- not only on-premise, also off-premise, such as partner's data and marketplace data coming together and providing that framework toward different elements.

We talked about this many years ago and called it the information fabric, which is basically data virtualization that delivers this whole segregation of data in that layer, so that it could be consumed by different applications as a service, and this is all delivered in a real-time manner.

Now, an important point here is that it's not just read-only, but you can also write back through this virtualized layer, so that it can get back at the data.
We talked about this many years ago and called it the information fabric, which is basically data virtualization that delivers this whole segregation of data in that layer.

Definitely, things have changed with this new framework and there are solutions out there that offer this whole framework, not only just accessing data and integrating data, but they also have frameworks, which includes metadata, security, integration, transformation. Gardner: For the companies that you work with at Forrester, when they do this correctly, what sort of benefits are they able to gain?

Yuhanna: The good thing about data virtualization is that it's not just a single benefit. There are many, many benefits of data virtualization, and there are customers who are doing real-time business intelligence (BI), business with data virtualization. As I mentioned, there are drawbacks and limitations in some of the older approaches, technologies, and architectures we've used for decades.
Real-time BI is definitely one of the big drivers for data virtualization, but also having a single version of the truth.

We want real-time BI, in the sense that you can’t just wait a day for this report to show up. You need this every hour or every minute. So these are important decisions you've got to make for that.

Real-time BI is definitely one of the big drivers for data virtualization, but also having a single version of the truth. As you know, more than 30 percent of data is duplicated in an organization. That’s a very conservative number. Many people don’t know how much data is duplicated.

And you have different duplication of data -- customer data, product data, or internal data. There are many different types of data that is duplicated. Then the data has a quality issue, because you may change customer data in one of the applications that may touch one database, but the other database is not synchronized as such. What you get is inconsistent data, and customers and other business users don’t really value the data actually anymore.

A single version of the truth is a very important deliverable from solutions, which has never been done before, unless you have one single database actually, but most organizations have multiple databases.

Also it's creating this whole dashboard. You want to get data from different sources, be able to present business value to the consumers, to the business users, what have you, and the other cases like enterprise search, you're able to search data very quickly.

Simpler compliance

Imagine if an auditor walks into an organization, they want to look at data for a particular event, or an activity, or a customer, searching across a thousand resources. It could be a nightmare. The compliance initiative through data virtualization becomes a lot simpler.

Then, you're doing things like content-management applications, which need to be delivered in federation and integrate data from many sources to present more valuable information. Also, smart phones and mobile devices want data from different systems so that they all tie together to their consumers, to the business users, effectively.

So data virtualization has quite a strong value proposition and, typically, organizations get the return on investment (ROI) within six months or less with data virtualization. Brinegar: This is exactly the fabric and the framework that Enterprise Enabler, Stone Bond’s integration technology, is built on.

What we've done is look at it from a different approach than traditional integration. Instead of taking old technologies and modifying those technologies linearly to effect an integration and bring that data into a staging database and then do a transformation and then massage it, we've looked at it three-dimensionally.

We attach with our AppComms, which are our connectors, to the metadata layer of an application. We don’t agent within the application. We get the at data of the data. We separate that data from multiple sources, unlimited sources, and orchestrate that to a view that a client has. It could be Salesforce.com, SharePoint, a portal, Excel spreadsheets, or anything that they're used to consuming that data in.

Actionable data

Gardner: Just to be clear, Todd, your architecture and solution approach is not only for access for analysis, for BI, for dashboards and insights -- but this is also for real-time running application sets. This is actionable data? Brinegar: Absolutely. With Enterprise Enabler, we're not only a data-integration tool, we're an applications-integration tool. So we are EAI/ETL. We cover that full spectrum of integration. And as you said, it is the real-time solution, the ability to access and act on that information in real time.

Enterprise Enabler provides the ability to virtualize, federate, orchestrate, all in real-time is a huge value. The biggest thing is time to value though. How quickly can they get the software configured and operational within their enterprise? That is really the key that is driving a lot of our clients’ actions.
When we do an installation, a client can be up and operational doing their first integration transformations within the first day.

When we do an installation, a client can be up and operational doing their first integration transformations within the first day. That’s a huge time-to-value benefit for that client. Then, they can be fully operational with complex integration in under three weeks. That's really astounding in the marketplace.

I have one client that on one single project calculated $1.5 million cost savings in personnel in the first year. That’s not even taking into account a technology that they may be displacing by putting in Enterprise Enabler. Those are huge components.

HP is a great example. HP runs Enterprise Enabler in their supply chain for their Enterprise Server Group. That group provides data to all the suppliers within the Enterprise Server Group on an on-time basis.

They are able to build on demand and take care of their financials in the manufacturing of the servers much more efficiently than they ever have. They were experiencing, I believe, a 10-times return on investment within the first year. That’s a huge cost benefit for that organization. It's really kept them a great client of ours.

We do quite a bit of work in the oil business and the oil-field services business, and each one of our clients has experienced a faster ROI and a lower total cost of ownership (TCO).

We just announced recently that most of our clients experienced a 300 percent ROI in the first year that they implemented Enterprise Enabler. CenterPoint Energy is a large client of Stone Bond and they use us for their strategic transformation of how they're handling their data.

How to begin

Gardner: Let’s go back to Noel. Do you have a sense of where companies that are successful at doing this have begun?

Yuhanna: One is taking an issue, like an application-specific strategy, and building blocks on that, or maybe just going out and looking at an enterprise-wide strategy. For the enterprise-wide strategy, I know that some of the large organizations in the financial services, retail, and sales force are starting to embark on looking at all of these data in a more holistic manner:

"I've got customer data that is all over the place. I need to make it more consistent. I need to make it more real-time." Those are the things that I'm dealing with, and I think those are going to be seen more in the coming years.

Obviously, you can’t boil the ocean, but I think you want to start with some data which becomes more valuable, and this comes back to the point that you talked about as the right data. Start with the right data and look at those data points that are being shared and consumed by many users, business users, and that’s going to be valuable for the business itself.
I would definitely recommend looking at newer technologies, because they definitely are faster. They do a lot of caching. They do a lot of faster integration.

The important thing is also that you're building this block on the solution. You can definitely leverage some existing technologies, if you wanted to. I would definitely recommend now looking at newer technologies, because they definitely are faster. They do a lot of caching. They do a lot of faster integration.

As Todd was mentioning, quicker ROI is important. You don’t have to wait for a year trying to integrate data. So I think those are critical for organizations going forward. But you also have to look at security, availability, and performance. All of these are critical, when you're making decisions about what your architecture is going look like.

We've actually done extensive research over the last four or five years on this topic. If you look at Information Fabric, this is a reference architecture we've told customers to use when you're building a data virtualization yourself. You can build the data virtualization yourself, but obviously it will take a couple of years to build. It’s a bit complex to build, and I think that's why solutions are better at that.

But Information Fabric reports are there. Also, information as a service is something that we've written about -- best practices, use cases, and also vendor solutions around this topic of discussion. So information as a service is something that customers could look at and gain understanding.

Case studies

We have use cases or case studies that talk about the different types of deployments, whether it’s a real-time BI implementations or doing single version of fraud detection, or any other different types of environments they're doing. So we definitely have case studies as well.

There are case studies, reference architectures, and even product surveys, which talk about all of these technologies and solutions.

Gardner: Todd, how about at Stone Bond? Do you have some white papers or research, reports that you can point to in order to help people sort through this and perhaps get a better sense of where your technologies are relevant and what your value is? Brinegar: We do. On our website, stonebond.com, we have our CTO's blogs, Pamela Szabó's blog, which have a great perspective of data, big data, and the changing face of data usage and virtualization.

I wish everybody would explore the different opportunities and the different technologies that there are for integration and really determine not what you need today -- that’s important -- but what will you need tomorrow. What’s the tech that you're going to carry forward, and how much is the TCO going to be as you move forward, and really make that value decision past that one specific project, because you're going to live with the solution for a long time.

Listen to the podcast. Find it on iTunes/iPod. Read a full transcript or download a copy. Sponsor:Stone Bond Technologies.

Metadata virtualization and orchestration seen as critical new technology to improve enterprise data integration

Related

I've tried a zillion desktop distros - it doesn't get any better than Linux Mint 22

One of the best foldable phones I've tested is not from OnePlus or Motorola

One of the best budget Android tablets I've tested is not made by Samsung or Google