BioSymetrics' Shiva Amiri says addressing lack of data standards key to scaling machine learning

At the latest TechToronto, Shiva Amiri, CEO of BioSymetrics (and a panelist at BetaKit 150) talked about some of the lessons she’s learned about data and machine learning at her company. BioSymetrics is developing real-time machine learning technology to analyze massive data in biomedicine.

Amiri kicked off her presentation by highlighting some of the challenges companies face when working with data in the health space. She said one challenge is that companies looking to build machine learning technologies often have to deal with “data variety,” which means working with different types of data that comes in from different sources.

Other challenges include the fact that data is often presented in different formats and organizations with large amounts of data don’t always know how to leverage it.

“The other issue is lack of standards, so a lot of different types of data are coming in different formats,” said Amiri. “Lack of scalability. A lot of different organizations are sitting on a ton of data and they don’t know how to scale their analytics capabilities. They don’t know how to do machine learning on all this stuff.”

In her presentation, Amiri also shared the analytic framework she uses to build machine learning-based solutions.

“This is why it’s important to have customer expertise, people in the loop, and also thinking really hard about what kind of data you’re pushing into your machine learning algorithm.”

“You get your raw data. You do the pre-processing on it, getting the data ready. Then you have to select parts of the data you want to use to do machine learning. This is called feature selection. Then you do machine learning,” said Amiri.

Amiri suggested that while this framework helps companies build the technology, there are often roadblocks within the process. She pointed to a study she conducted of 1,000 autistic children and control patients across the US to show why this is the case.

“We used machine learning and we thought, wow, we came up with a new model for the diagnosis of autism,” said Amiri. “Then, when we tried our model on other data sets that we hadn’t tried yet, we got conflict reports almost everywhere we tried. So we didn’t have a model. We had to go back and think about what we did.”

Amiri said this experience not only helped her inform her entire framework, but also showed that while data is crucial to building machine learning technologies, it’s not completely valuable unless it’s being used and applied carefully.

“This is why it’s important to have customer expertise, people in the loop, and also thinking really hard about what kind of data you’re pushing into your machine learning algorithm,” said Amiri. “So it’s not just about using deep learning, although it’s very sexy right now… You have to be able to identify the biases in the data and select the optimal set of data to get you to the machine learning stage.”

Watch the full presentation below:

The next TechTO takes place on August 14. Get your tickets now!

Amira Zubairi