Season 1 / Episode 11

Why Evolving Your Enterprise Data Strategy is Essential for AI Success

with:
Marc LeBlanc
Host
Amna Jamal
Guest
"The AI space is all built on top of data, but it's evolving really, really fast. I feel like organizations are still catching up, because they don't have the right data strategy."
Amna Jamal
Data and AI Market Leader for Canada at IBM

About the Episode

With enterprises scrambling to bring in artificial intelligence (AI) and large language models (LLMs), many are learning the importance of clean data and strong data strategy.

In this episode, Amna Jamal from IBM joins host Marc LeBlanc to talk about data strategy in the age of AI. Working with clients to help them address their data and AI challenges, Amna shares a unique perspective on the importance of a well-developed data strategy for AI success within enterprises. Covering everything from practical ways to identify the skill sets you need to achieve data and AI success within your organization to the opportunities AI creates for businesses, Amna and Marc underscore the inextricable link between data and AI.

Transcript

Amna Jamal: [00:00:00] Data will always be, I think, the center of attention. And having the right data strategy in place will ensure that you are getting the right insights, complying with the regulations and that the time to value is fast. And it's because you have the right data strategy.

And the AI space. Again, it's all built on top of data, but it's evolving really, really fast and I feel like organizations are still catching up. With AI, it's just the data sprawl. Like how do you consolidate data? Make sure that the quality of data is good. Because with large language models, if you don't have good quality data, how do you build a model which is trusted and gives out good results?  

So essentially it's garbage in, garbage out. Data quality is bad. Then the insights that you're [00:01:00] generating are bad as well, right?  

So if you have the right data strategy in place, you are better able and better equipped to fulfill the KPIs and metrics that you are gauged on, or that any organization is gauged on.  

Marc LeBlanc: This is Solving For Change the podcast where you'll hear stories from business leaders and technology industry experts about how they executed bold business transformations in response to shifts in the market or advances in technology. In every episode, we'll explore real world strategies and technologies that fuel successful evolution.

I'm your host this month Marc LeBlanc.  

I'm joined today by Amna, a technical leader with IBM in the data and AI space. Amna, thanks for joining me today.  

Amna Jamal: Thank you for having me today.  

Marc LeBlanc: Just to set the stage, maybe give us a sense of what your role is today. What sort of things are you working on?  

Amna Jamal: My, my name is Amna.

I'm from IBM. I've been with IBM for about three, three and a half years now, and I've done everything under data and AI space. So, now I lead data [00:02:00] and AI from the technical front for IBM, which means talking to clients, understanding their business needs and how to address those needs and challenges with the help of product offerings that IBM has to offer.  

Marc LeBlanc: And there's a lot with everything going on with AI, data seems to be shoved more to the forefront than it ever has been. What are the some of the shifts that you're seeing that organizations have to tackle when it comes to their data today?

Amna Jamal: The data problems are still the same, where data resides in silos. They don't know how to resolve those silos across the organization. So, essentially how do you get the integration tooling? How do you understand the data? How do you make the data available to the right people in a timely manner?

So going back to what you asked, with AI, it's just the data sprawl. Like how do you consolidate data, make [00:03:00] sure that the quality of data is good. Because you know, with large language models, if you don't have good quality data, how do you build a model which is trusted, and gives out good results?

So essentially it's garbage in, garbage out. And that I've seen across decades. It has been the problem.  

Marc LeBlanc: Yeah. So, you mentioned one of the issues that we have with data sprawl and you also mentioned the other one being data in silos. What's an example of what that looks like?

How would someone with an organization identify that?  

Amna Jamal: That's a very good question.  

So I'll maybe first explain why the silos and, one example that I like to give is that IBM has this good or bad habit of acquiring so many different organizations. So, which means that the organizations and the companies that they're acquiring, they're using different tool sets. They have, different skill levels, different people, and, [00:04:00] they have of course data residing in different databases. It forms a silo, right?  

So the bigger the organization, the less communication between these departments and when you want to have a holistic picture of, let's say if I'm signed up with a bank, I have different products, but if the bank was to give me a promotional offer, they need to look at a complete picture of me as an individual or customer. So how do you resolve those silos is where you go identify where the data is residing. What is the need of the hour? Can you bring in the data? Can you give meaning to that data because you know the column data and sometimes the columns are just x, y, z, underscore, a, b, c.

I don't understand what that data is. So, essentially there needs to be a lens where you can discover the data, you can give meaning to the data, and then that is how you can resolve the silos. [00:05:00]  

Marc LeBlanc: So am I hearing you correct in that one of the first steps in resolving those data silos is doing a bit of a data discovery?

Would that be the right way to look at it?  

Amna Jamal: Yes, absolutely.  

Marc LeBlanc: And what are some of the skill sets? So, if I had an organization and I had a sense that I've got some of these data silos, maybe you've done some acquisitions or maybe you have some legacy infrastructure, technology somewhere. How do I get ready to go and tackle that problem?

Amna Jamal: That's a very good question. So, what I would like to say is that, you need to have a data strategy in place and the skills will follow.  

So, what do you want to achieve? Do you want to build a lakehouse? A lakehouse because you have some workloads which will help you to build let's say, or facilitate data science use cases. Or there could be use cases which are more on the analytics sides of things.

When I say [00:06:00] analytics, it means that you are powering dashboards and you're building reports, which means that you want to use the engines which can produce those insights in seconds rather than minutes or hours. But if you're a data scientist and you're building machine learning models, then

you can't wait for the data and the compute. So, with data lakehouse, you can bring in the data and use the engines which will support the use cases that you have, right?  

For that you have to see where the data warehouses are. Are you using object storage? Which compute engines are you using? And so on.  

So, if that's the strategy, the second part would be that data administrators, data owners, who have the understanding of where data is residing, they can make those sources available to be brought into the data lakehouse.  

And then there are different personas. Then there would be the stewards of the data who would

[00:07:00] basically do the dirty work of mapping data reports to the data that is residing in warehouse or, lake house or object storage.  

And then come the engineers who now need to do the work of data integration, data understanding, data governance folks and so on. So, essentially once you have the data strategy in mind, the personas and the skill sets will follow.

But without that, it will just be confusing for... Not confusing, time consuming to sort out the data.  

Marc LeBlanc: So I have a couple of follow up questions out of that. But probably my first one, if I go back to what you started with around the data strategy, if someone was defining a data strategy, are there key attributes that you would expect to have?

What makes up a good strategy?  

Amna Jamal: The strategy, there are two approaches to it. One is [00:08:00] that as an organization, I want to have access to good quality data in a timely manner. Like as a business owner, or a business user. I don't care about where the data is coming from. I have certain SLAs that if I'm building a report, it should refresh in three seconds or one second or two seconds.

If I'm building a machine learning model, the SLA, or the the KPI, for me would be that the model should perform well and there's no data loss in between.  

So, going back to your question, if you are thinking of it from a business user strategy or to facilitate the business users, you would want to have the capability to resolve those silos in a timely manner, giving meaning to the data and also while being compliant with the privacy and all the regulations that are coming out.

If that's your thinking, then you need to go back [00:09:00] with what is available out there. Which tool sets do you need and how are those tool sets basically addressing your needs? Can you get data in a timely manner? Can you mask the data at rest, in flow? If there is a data issue, how can you proactively observe what is happening on all of those pipelines and integration pipelines that you have written?  

So observability is a key. Data integration is one of the biggest things, like how do you connect to data? Like, you have data residing on premises, you have data residing on cloud for a bigger organization, and that is how the silos are formed.

So, how do you integrate with that data? Do you want to virtualize it where you don't want to move the data, you just want to move the aggregated results, which will facilitate the, the business users. Or do you want to replicate from, let's say like if it's on mainframe, you want to bring it [00:10:00] onto a database on cloud. So, how do you replicate it? So, that is one strategy that you would have to think, which will of course change based on the business need of the organization. Second, is like once you have formed integration pipelines and you can observe those pipelines so you know if something goes wrong, how do you observe it? Like how do you do a root cause analysis? How do you know which pipeline is affected if one is failing, and so on. So you have observability on top of it.  

Once you have integrated, once you have moved data from source to target, the second thing is that you want to govern all of that. Meaning like you bring in the metadata, you give meaning to that metadata so you know what exactly is in the tables and the data that you're bringing in, while also enforcing the privacy and the regulations that you have, so you're compliant with the internal and external regulations and, the sensitive information is maked [00:11:00] for usage and so on.  

And the third and the biggest thing is that how do you give access to those users, like the business without having them to wait for like weeks or three weeks to get data access. So, how can you use data as a product, and facilitate the needs of the users, the business users? So, again, going down to what is the need? The way to facilitate those needs is simpler than setting out the data strategy.  

Marc LeBlanc: Okay. I'm gonna try to summarize because there's a lot in there.

Amna Jamal: Yeah.

Marc LeBlanc: I kind of heard, number one, understand what the business needs. What is the outcome you're trying to achieve? The sources, the strategy behind that shouldn't matter too much, just understand clearly what the business goal is. The second one was around making sure that you understand how you're going to govern,

observe, make sure that things are getting into there so you have good quality data. And the third [00:12:00] one was around, providing access to the consumers of the data. Do I have that right?  

Amna Jamal: Absolutely.  

Marc LeBlanc: Excellent. Awesome. The other half my question from an earlier question was you were touching on some of the personas: the architect, the stewards of the data, the engineers.

Tell me, is there a bit of a difference between those roles in the data world versus a consumer, or is that encompassed within those personas.  

Amna Jamal: I would say again, these roles are very generic roles and in every organization, these roles would be different. For instance, if you're talking about a startup, there's one person, but they will be wearing different hats. You know, data steward, data engineer, data scientist, CEO, CTO and so on. Like that would be one person.  

But for bigger organizations, there are distinct roles which they'll be doing those tasks. So, they [00:13:00] could be business users as well, depending on the use case that they're working on.

But for instance, essentially what has happened in the past was that data scientists would do all the data science work for all the organizations, but now even the business users want access to it, right? So it's not just the data scientist who would be working with the platform or working with the data, it could be the business users, analysts who want to look at the reports.

But from the data side of things, data owners, they know where the data is. So, I have not seen the intersection between the business users and deeply technical data users as well. But, with how technology is evolving, they can feel what data owners feel too. Business users can feel what data owners feel, too. But the experience, would be different across the tool set.  

Marc LeBlanc: Right. [00:14:00] And, you know, thinking of an organization that maybe they do have a data strategy, what sort of skill sets should they be investing in in their people?  

Amna Jamal: That's a very good question.

So, I think skill set is just, skill set in terms of understanding where technology is moving to. So, they would have data owners, data engineers, data stewards and data scientists. But I think the need of the hour would be how can these people get hands-on with the right technology to fulfill their business needs.  

So oftentimes, all these organizations go through a lot of POCs, proof of concepts, where, they do co-create and they do work with this technology. But, every other vendor, including IBM has a tool set for that. So how do you [00:15:00] essentially figure out which tool is better than the other?  

So in terms of skill set, getting hands-on is one thing, but understanding the vision and where it can help the enterprises is another thing. So, I guess equip them with the right technology, but also the vision where the organization is leading. So, everyone is equipped to make the decision and weigh in when it comes to making a data decision or strategy.  

Marc LeBlanc: So, are you saying that that should tie back to your data strategy? So if you have a well-defined data strategy, you talked about number one, understanding that outcome.  

Amna Jamal: Yeah.

Marc LeBlanc: If that's well defined and well understood and communicated out to the organization, that's kind of step one into finding the right skill sets and understanding the right tools.

Amna Jamal: Yeah.  

Marc LeBlanc: Awesome. Shifting a little bit, we've talked a little bit about what goes into some of these data [00:16:00] questions?

I'm wondering what are you seeing on the opportunity if an organization gets this well, what are the opportunities that are presented them to them today?  

Amna Jamal: It's so many. There's so many. Let me think of an example.  

I'm bank, right? And one of the KPIs, or one of the biggest things that I want is customer retention because the customers are churning, there's so many promotions. So, how do you retain this customer, because getting a new customer or a client, for a banking organization or any other organization is more expensive than retaining a customer. So in terms of the opportunity, essentially it would be, how can you increase revenue for the organization, increase customer loyalty. Customer satisfaction is when you have a better understanding of who the customer is, by [00:17:00] consolidating all the data for that particular customer. And if, again going back to the data strategy, if data quality is bad, then the insights that you're generating is bad as well, right? So, if you have the right data strategy in place, you're better able and better equipped to fulfill the KPIs and the metrics that you are gauged on, or any organization is gauged on.

So opportunity is, with data you can help increase not only the revenues, improve the customer loyalty, but also the AI that you're building on top of it. So, gen AI is the word of the town these days and a lot of organizations have got in trouble because they used data that wasn't properly cleaned,

it had copyright information, it couldn't filter out hate, abuse, profanity and so on.  

So, if you're using [00:18:00] or building a pipeline or a use case on top of that, then your reputation is at risk. If there's no governance around that AI process as well, then there are a lot of factors, that will

degrade the brand of that organization that is using . So, and with AI there's like just this automation coming, which helps with productivity for not only employees, but folks who are interacting with the organizations. So, opportunity is a lot. There have been so many reports by Gartner and all those analysts where given the numbers in terms of AI, if you have the right AI and data strategy, then opportunity is huge both in terms of dollar value and also the other metrics that are important to the organization.

Marc LeBlanc: Yeah. [00:19:00] You touched on AI a couple of times just now, and we've spent a little bit of time talking about data, but I'm wondering if you could talk a little bit about AI and what you're seeing. It's a term that is just kind of thrown out there. And obviously, you touched very specifically gen AI.

We know it's hot, it's an industry buzz right now as being applied in very different ways. What are the couple main ways that you see businesses adopting and having success early. What are some of the patterns that you're seeing?  

Amna Jamal: The pattern that I'm seeing, and especially at IBM as well as bigger organizations, too, it's just that it's more around productivity where

I can write an email, I can send an email. But oftentimes if I use gen AI, that task that would take me 10 minutes now takes me two minutes. It is basically opening up my time to do more useful things. [00:20:00] So, opportunity when you use gen AI at an organizational level is essentially productivity, making lives easier. And that productivity number is also tied to the dollar value or the dollar number too, because you're more productive and do more harder tasks, which will translate into something more valuable, right?

So, some of the use cases that are resonating a lot with the clients is around retrieval augmented generation, where especially employees when they have like a lot of documents that they need to go through, how can you converse with the document effectively?

But of course there are risks associated with it as well because AI is not perfect, right? AI will hallucinate, it will give wrong answers, which can translate into something which will give you reputational [00:21:00] damage.

This space is evolving too fast now. LLMs, large language models were released like end of 2023, 2022. It's all a blur. But now it moved from large language models to now it's agentic AI where the focus is on automation. How can you as an enterprise connect with all these different tool sets that you have and automate the processes automatically using AI.

So again, it all goes back to productivity.  

Marc LeBlanc: You talked about when an organization starts using AI, if they're not being careful about the types of data they're using, that they can get themselves into trouble. Thinking back to the data strategy, how might it organizations start to prepare and make sure that they're secured and on the right path to make sure they're using data that they're [00:22:00] one, allowed to and is gonna get them the outcomes that they need?

Amna Jamal: Well, that's a very good question. So, that is where the governance comes into the picture, where the policies and regulations are defined. Some of the regulations are external, for instance, like GDPR. And when it comes to AI, EU AI Act, and even within Canada, there are regulations, or bills rather, which will eventually turn into regulations.

So those are the regulations where you have a definition of what is allowed and what is not allowed. The second thing is how do you enforce those regulations onto using a tool set or a platform. So, when it comes to governing the data, you need to first... You need to be able to understand the data first to be able to enforce those policies and regulations.

For instance, let's say the regulation is, that you cannot disseminate [00:23:00] private and sensitive information, then how do you understand from the data that is there that like this column has, let's say, email addresses or phone numbers? So once you have discovered it and you have identified, using of course AI and the rules that you have written, that this is a column which has private and sensitive information. You can enforce the policies by masking the data at rest or in transit, so you're still compliant with those regulations. And you can define business rules within the platform as well, where you can, again, enforce those business rules onto the data that you're bringing in, making sure that you are compliant with the regulations.

Marc LeBlanc: Right. So, really understanding your industry is what I'm hearing, understanding what compliance and governance you have to adhere to.  

Yeah, I'm hearing a lot of other things as well, we've distilled this down to some simple messaging. But there's a lot at play. I'm [00:24:00] wondering how critical is technology and tooling around automation to a successful data plan?

Amna Jamal: It is. So it is very important, because a few years ago I've done a lot of work, manual work, where you're trying to figure out, this is my target data, this is my source data. How are they mapped together? Like, I would do it manually, but automation is something which I can use the time that I was doing manually to map these documents to actually understand what is coming out of the data. So, it plays a lot of a role, and saves a lot of time. And when you're doing manual work, you're bound to make errors as well.  

So, that is where AI, gen AI is embedded in most of the platforms out there as well. That is because automation is the focus [00:25:00] to make the life of the employee easier and also ensure that there are fewer errors. And they're only notified when there are issues, so they can look into it rather than make those mistakes.  

Marc LeBlanc: Amna, thanks for the conversation today. If I kind of recap some of the themes we've had, we talked a lot about data and what it takes to

get it right. Starting with the strategy, understanding what the business outcome is, building on that understanding that if you have a good strategy, well defined and you communicate that out to your staff, that you're gonna get the right skill sets, it'll be easier to navigate what tooling's going to support that outcome.

And we talked about AI and how to make sure we're getting good quality data that we're allowed to use. Because we don't wanna get in ourselves into trouble with data that was maybe proprietary or copyrighted. Is there any parting thoughts that you'd like to give around data and AI? What's next?  

Amna Jamal: Data will always be, I think, the center of [00:26:00] attention, and having the right data strategy in place will ensure, that you are getting the right in insights, complying with the regulations, and that the time to value is fast. And it's because you have the right data strategy.  

And AI space, again, it's all built on top of data, but it's evolving really, really fast. And, I feel like organizations are still catching up, because A, they don't have the right data strategy. Second for AI, you have regulations and a lot of risks associated with it. So, it's an interesting space and I think it'll keep on evolving.

Marc LeBlanc: Thank you so much for sharing your stories and insights.  

Thank you for listening to Solving for Change. If you enjoyed this episode, leave us a rating and review on your favourite podcast service. Keep an eye on your favourite podcast service for our next [00:27:00] episode.

Show All Content
Show All Content

About our guest

Amna Jamal
Guest

Amna is a seasoned Data and AI Subject Matter Expert (SME) at IBM, boasting over 8 years of expertise in data management and data science. With a Ph.D. in Engineering from the National University of Singapore, she brings a wealth of knowledge and experience to the field, driving innovation and excellence at the intersection of data and artificial intelligence. At IBM, Amna works with enterprise clients to create solutions that address their unique data and AI needs.

About our hosts

Marc LeBlanc
Host

Marc LeBlanc is Director of the Office of the CTO at MOBIA. An experienced technologist who has worked in large enterprises, start-ups, and as an independent consultant, he brings a well-rounded perspective to the challenges and opportunities businesses face in the age of digital acceleration. A thoughtful and engaging speaker, Marc enjoys exploring how technology and culture intersect to drive growth for today’s enterprises. His enthusiasm for these topics made him instrumental in creating and launching this podcast.

Keep Listening

Load more Episodes
Load more Episodes