Rahul Singhal On Optimizing AI In Data Modeling With Humans In The Loop

Without human expertise, machines are simply pieces of metal. Machine learning itself implies the need to be taught. Likewise, with AI, it cannot do anything of its own volition without being programmed and taught with human expertise. Chad Burmeister is joined by Rahul Singhal, the Chief Product and Marketing Officer at Innodata Inc and Former Chief Data Officer at IBM Watson. In this episode, he explains data analysis and modeling processes and how AI works in conjunction with humans to optimize it. As an expert on the subject, Rahul further highlights the importance of having "humans-in-the-loop" for the success of any AI venture, from data collection to fine-tuning. He also shares exciting projects to look forward to from Innodata and expounds on how he sees AI in sales will look like in the near future. Gain perspective from an expert and tune in to learn more.

Listen to the podcast here:

Rahul Singhal On Optimizing AI In Data Modeling With Humans In The Loop

I'm here with an expert in the AI field. This is someone who was ex IBM Watson. He did a lot of work there in Machine Learning, NLP, Vision, Speech, all kinds of things. Rahul Singhal is the Chief Product and Marketing Officer from Innodata. We're going to go deep, technical, and answer some pretty significant questions.

Rahul, welcome to the show.

I'm glad to be here.

I would like to help our audience connect with you because you've been very successful in your career. I like to rewind the tape and start with when you were younger. What was your passion? What did you do then and how does that connect to now?

When I was young, I've probably spent most of my time playing. I grew up in India. I spent most of my childhood living on an Indian Army campus. My dad was head of engineering for the Indian Army Engineering College there. We had a great facility where I had access to swimming, tennis and squash, which a lot of people in India at that time didn't have. I grew up playing table tennis and we presented my state. I think being competitive has helped. If I was to look back more than anything, being competitive has helped me have the drive to do better.

Were you the guy standing eight feet off the back of the table?

I was.

We never quite got that way. We had a table in our house, but I was always good at this underspin or the over, but not the 8 to 12 feet back off at the end of the table.

It's a great sport. Very fast. One of the fastest sports after ice hockey. I love playing it. I spent five hours a day just playing table tennis until I graduated from school.

When was the last time you played?

We all have to recognize that machines are dumb. They don’t understand until you teach them.

I play in my basement with my kids, but competitively I haven't played for many years.

They make you turn the paddle around and hit on the small end from time-to-time. Thanks for sharing your background. AI has been around for a long time but I heard about what AI winter, where the computing speeds maybe weren't there, but it's back in full force with what you have in Watson. Tell me about AI in your current product and service. How does it play? What does it do? Talk to us about that.

A lot of people ask me about, "Is AI still a buzzword? Is this still real? Is it still there?” I read an article by Sundar Pichai, the CEO of Google. It came out in New York Times and Wall Street Journal. Four years ago, he had made a statement that AI is going to transform or is going to have more impact than electricity did on humans.

He reiterated the same comment where he said he still stand by it. He sees more and more transformation happening in our life, and for humanity as a whole and AI is going to play a big role. I see that in our business. I think the AI investments are only accelerating. Although AI has been around for years, I was very fortunate to be part of the journey where IBM commercialize the AI technology.

IBM has always been in forefront of technology development. During 2013, Ginni Rometty made the announcement to commercialize IBM Watson as an AI platform and bring it to all. I would argue that next become part and parcel of every enterprise technology strategy. If you think about where we are in the journey of technology, there are four eras. There is the AI era of internet companies like Google, Facebook, Baidu, which are way ahead of the market and they're probably imbibed this. We are in the era of business AI where companies like J.P. Morgan, the banks, and the manufacturing companies are starting to invest in it.

I think in next few years, you will see the era of prescriptive AI, where AI will become part and parcel of most workflows, and you will start to see AI as making recommendations or providing you insights. Then in next 5 to 10 years, you will see autonomous AI, where machines will be the reason. To make all this happen, you need clean data. Eighty percent of the time has to be spent cleaning annotating data, and that's what Innodata is positioned to do. That's what we are focused on. We build platforms that allow unstructured content to be annotated and given for a training model.

It's interesting because I've asked, compare the internet to AI, what do you think the impact will be relative to the internet? I've heard 5X, 10X, and all kinds of predictions but tying it to electricity, that shows you that the exponential gains that can happen for all of humanity, whether it's solving cancer, feeding people around the world, or everything can be changed with AI. You're right. The data matters in a big way. It turns out data matters. Garbage in, garbage out. Talk to me about the upfront data strategy related to successfully deploying it. How do you get the data ready? What happens in big data projects like this?

When you're building an AI model, you first have to have the data. Where do you get the data? A lot of clients have data. They may have the data but they may not have the rights to use the data and we find it all the time. We, in our company, are doing few things. For a very large, sophisticated, leading machine learning company in the world, we source content and sourcing the content isn't too easy.

We create the data and the data that needs to be script has to be in a format and be diverse enough that the models will be trained and more often than not, we are creating own content. I'll give you an example. We are working for a client of ours where they need to train a model to recognize cellphones and here, you need to create a concept.

Human Expertise: We need to have humans in the loop who are constantly fine-tuning the model.

You need to have someone sitting in front being able to have diverse ethnicities of people who are holding phones in a variety of different positions with different colors so that the machine is able to recognize that it's a cellphone, and you need something like 100,000 images. We are putting together a team of 500 people around the world with different ethnicities, almost acting it to create that content.

This happens all the time with speech collection data. We have a client that's looking for training a model on a Swedish, so we are putting together a team of 200 native Swedes who can enact and we're recording it. There is this blend of acting of production with AI. That's absolutely critical before you can start building models. Model building is not the hard part. It's the annotation and the cleaning of the data that's the hard part.

It reminds me of the Amazon Voice Conference few years ago in New Jersey. It’s the very first one and there was a great speaker. He talked about a picture of donuts and some of them were happy. One of them was a couple of whole donuts and he said, "How many donuts are in the tray?" All these people in the room are like 5, 4, 7, and he’s like, "Why can't we all agree?"

It's because one of them was a donut hole so it’s not a donut. One of them was cut off halfway in the image, and one of them had a bite out of it. Everyone had all these different opinions of what the data was. I think that's what you're getting at here. It’s defining it and being able to annotate it so that we all agree on what the core unit of measure is.

It's just challenging because we all have to recognize the machines are dumb. They don't understand until you teach them, and here, you're trying to teach machines to think like humans. As we know, no human thinks alike, so you have to provide machines with sufficient enough data that they can start to understand and we are far away from the world of reasoning.

Let's talk about the importance of human in the loop, and maybe I can start by sharing a story. We partner with a company out of Israel called Exceed.ai, and it competes with Conversica, it's an email AI. For six months, we add myself and a handful of reps go in and answer the emails that were coming in.

We created canned responses over time to about a dozen different questions that come up frequently. By the end of the six months, the team came back from Exceed and said, "You're at a 98.9% rate where you're using exactly what's already programmed in the computer." The 1% Delta is not that big of a deal.

You could flip the switch and now you don't have to review any of those emails. They said, “To solve for the 1%, all you have to do is something pretty basic. Change a little bit of the verbiage in case you send something to the 1%.” They're like, "No. I am interested to solve for the false-positive problem.” It was interesting to go through that because when you talk about human in the loop, those are the kinds of things whereas the models being trained, it's very important to have somebody look over everything, but over time, you can flip the switch and now the AI can handle those basic decisions.

That's how it works. My perspective is the AI, for easy enough use cases, will get to the 98% to 99% accuracy levels. I'll give you a story. We were working with a client of ours. We're looking to extract metadata from news articles. We've trained it on 5,000 documents, we're getting 75% accuracy, and now they need 100% accurate data, so we need to have humans in the loop who are constantly fine-tuning the model. We probably will get to 85% to 90% accuracy with additional 5,000 documents, but every additional 1% is going to take at least 10,000 more documents. As you get to that 90%, 91%, 92% to 93%, the volume of data required to get the higher level of accuracy starts going up.

The reason is very simple. The machine where it’s not getting the right accuracy level needs to be able to see enough diverse examples of the errors that it’s not recognizing, and there are not enough examples. The machine doesn't recognize it. How do you now train those models and how do you build techniques?

We are just at the beginning of truly transforming industry and humanity by the use of AI.

This is where you start getting into active learning to rebuild an annotation platform or cleaning platform where we can identify the types of records that are required for you to get, not take 10,000 documents, but take 100 documents but have enough diverse samples for that 1% error that need to be done.

I think we're going to see a lot more of these so that humans in the loop is going to be absolutely critical, but I feel that you cannot have any AI production workflow without the humans in the loop. If you look at financial services, you can have an error. Bloomberg can't afford to have a single error because that one error would cost millions of dollars to hedge funds. You need humans to still review it.

That probably leads to my next question, which is why do most or a large percentage of AI projects never make it into production? They never get off the ground. What's the reason for that?

I think the chairman of IBM, Arvind Krishna came out in the open and said 80% of IBM projects are feeling for lack of three things. Customers don't have enough data, we don't have enough training data and they don't have enough annotation. They don't have enough subject matter experts who are available to clean that data.

Think about a bank or a healthcare firm. The data centers don't want to do the grunt work of cleaning the data. They don't want to take the burden of annotating 10,000 to 20,000 documents because they want to build a model. You can give it to a business analyst because it's their day job. In addition, you have to do it, so you need somebody who's got a third party and the subject matter expertise to do it. That scalability doesn't exist, which is where the data comes in.

Our thesis is you need scalable domain experts to productionize these applications. Once you've done that, once you've got enough people, you still need platforms and subject matter experts who are reviewing it. If you remember ERP projects, it is used to take years of implementation with large budgets and change management. If you don't have executive sponsorship from the top, it's going to be hard to productionize these AI applications.

It reminds me of a project we did at RingCentral where I was running the inside sales group an SDR BDR team, and the data analysts said, "Here's a list of 30,000 leads. I don't know how to scrub for titles properly. You guys are the business experts go." We're 30,000 rows. What are you telling me? We have to sort and do all this.

We discovered this company that had already done the hard work of Boolean logic and figuring out titles, and they were able to say, "Give me the list of 30,000. We'll group them. We'll know which pieces of data are good and what's bad, and what can we throw out." That is a challenge. You're right because it's either the analyst or the business person and neither want to do the hard work. Thinking about your product, if AI didn't exist, the company doesn't exist. Can you make that form of a statement, would the company still exist, or it would be different?

Fortunately, Innodata is a 30-year-old company. We are the backbone of large information publishers where we take unstructured content and digitized content. The company will exist. If AI wasn't there, it'll be something else that we would be focused on because we like to call ourselves leaders in data engineering. If data didn't exist, we wouldn't exist.

You've worked with a lot of companies that are pretty well-known, big brands, large, multi-national, and multi-global kinds of companies. Have you seen any deployments of AI in the sales motion that are unique out there and what are you seeing in AI for sales?

Human Expertise: You cannot have any AI production workflow without the humans in the loop.

I've seen a few. One of the interesting things that we use for our marketing purposes is a list. A lot of these list companies have intent data. They're able to use cookie data and intent to purchase. I think that's interesting because that allows us to do hyper-targeting and hyper-personalization from an outbound sales perspective.

We, personally, are experimenting with some interesting outbound sales. Innodata has a subsidiary we call Agility PR, where we index 2 billion news articles a year. We are able to curate for any given topic, curate stories that have been published in the news, and use AI to identify individuals who are in those stories, then send them outbound messaging.

We are trying to hyper-target from sales perspective. We are experimenting with other interesting sales and outbound sales methodologies using AI. One that I'm passionate about is GPT-3, which is OpenAI. It’s able to rewrite messaging. We are experimenting with that. How can I take a message and not have a salesperson spend an inordinate amount of time creating a message or sending a templatized message? How does a machine create that message? We are experimenting with a variety of different tools and techniques. Hopefully, we'll be able to release something soon.

We partnered with a company called Nova.ai a few years ago when we went to market. It would look at your LinkedIn profile. It would know if you're in a nice weather or hot weather, for example, as one opener. AI see if it's very hot or it's nice and cool in New Jersey, or whatever. I assume we went to the same school together. Nice article that you wrote.

I think you're right that people want to feel they're not just being spammed and when you can open with a sentence or two that's highly custom and personalized, it changes the entire dynamic of the conversation. That is the future of AI, especially in the sales motion but AI in general. When you open, you said, "Compared to electricity, it's going to be bigger." In a few years out, what would you end cap our conversation with?

I started my conversation with that. I think we are just at the beginning of truly transforming industry and humanity by the use of AI. Years from now, I would be reasoning becoming part and parcel of AI model. You might've read about AI beating the top poker players. That's a large amount of reasoning that happens. What permutation combinations do you take to make those kinds of decisions?

You might've heard about IBM Watson coming up with the data, which from the first full technology, it does reasoning with a machine can debate. How do you think that technology in commercial applications? Will you see a lot more of those coming in? I'm very optimistic about AI and I think the best is yet to come.

We've been talking with Rahul Singhal from Innodata, the Chief Product and Marketing Officer. Thanks for investing some of your time with us to share this information with our audience. Thank you so much.

Thanks for having me.

Important Links:

About Rahul Singhal

Prior to joining Innodata, Mr. Singhal was Chief Product Officer at Equals 3, an AI marketing platform which won several accolades including Gartner Cool Vendor, CES Top 5, and IBM Watson ISV award. Before Equals 3, Mr. Singhal spent 12 years at IBM, the last three of which he spent leading the product portfolio for the Watson Platform which included a collection of APIs for vision, speech, data and language. During his tenure at Watson, he grew usage of the services by over 100X and launched over 15 new services.  

Prior to Watson, Mr. Singhal was a member of IBM’s Strategy and Transformation Practice. Mr. Singhal is also an Adjunct professor at New York University (NYU) where he teaches Competitive Strategy and Advanced Experimental Design and Machine Learning. He lives in New Jersey with his wife and two children and enjoys swimming, playing tennis and reading.

Previous
Previous

Matching EHS Professionals With Great Opportunities With Michelle Tinsley

Next
Next

Leveraging Artificial Intelligence In The Sales Motion With Jeff Mize