Indian machine learning operations (MLOps) startup Scribble Data has raised $2.2 million USD ($2.8 million CAD) in seed funding to fuel its North American expansion plans.
The Bangalore, India-founded firm has already begun building a presence in Canada, where it says it has seen strong interest in the feature engineering for machine learning (ML) space. Feature engineering refers to the process of transforming raw data into trusted feature sets that can be used to train ML models.
“We incorporated in Canada, specifically, because we saw a lot of inbound interest in the space … coming out of North America.”
Scribble Data plans to invest the capital in product development and add more customers in the North American market. To get there, the startup has established an office in Toronto, where it hopes to dip into the city’s deep ML talent pool.
“We incorporated in Canada, specifically, because we saw a lot of inbound interest in the space that we are in—which is feature engineering for machine learning—coming out of North America,” Scribble Data co-founder and CEO Indrayudh Ghoshal told BetaKit in an interview.
Scribble Data’s all-equity, all-primary seed financing, which closed earlier this month, was supported by a group of India-based investors. Blume Ventures led the round, with support from Log X Ventures, Sprout Venture Partners, Vivek Gour (former CFO of professional services firm Genpact), and Ganesh Rao (partner at Mumbai law firm Trilegal).
Founded in 2016 by Ghoshal and Bangalore-based CEO Venkata Pingali, Scribble Data’s initial focus was on reducing friction in the data science process, but the company narrowed its approach to building a feature engineering platform in 2019.
Today, Scribble Data offers a feature store—a collection of applications that streamline the feature engineering lifecycle to help companies train ML models and reduce the time it takes to bring data science products and use-cases to market. The company’s apps can be integrated with customers’ existing tech stacks or used to build their own feature engineering apps.
“With more organizations effectively becoming data companies, there is a proliferation of high quality, compliant feature sets for ML and sub-ML use cases in an organization,” said Blume VP Anirvan Chowdhury. “And those feature sets will need to be managed, re-used and served in the most effective manner into ML models or other sub-ML use cases.”
Ghoshal says there is a “disconnect” between two key groups of people in the data world—data engineers and data scientists, which focus on applying data to solve business problems.
“There’s a layer in between where the data scientists are dependent on quality data being available for them in order for them to go on and do other stuff, and that is where the ML engineering workflow comes into the picture, which is people who can interface with the data scientist as well as the data engineer,” said Ghoshal. “That’s the gap that we operate in.”
“Within that gap, the large space that we are trying to own is feature engineering,” he said. “What does it mean to massage data in specific ways such that data scientists are able to use it to answer specific business problems?”
Ghoshal said that, “most often,” these business problems can be addressed by training ML models. But he added that feature engineering is also being used by data scientists to address sub-ML problems, from advanced analytics to decision intelligent systems.
According to Ghoshal, this category of software has come about over the last three or four years, as tech giants like Amazon went ahead and built their own in-house feature stores.
“Since then, there’s been a realization that much smaller companies also … need software like this,” said Ghoshal.
The sector also features players like San Francisco-based Tecton, which was launched in 2019 by the Uber engineers who built the company’s machine learning platform. “When that happened, there was a lot of interest in this space,” said Ghoshal. Following in Amazon’s footsteps, AT&T has teamed up with California AI startup H20 to launch its own feature store.
North of the border, Toronto’s Shakudo is taking a similar approach, building a platform designed to help data science and ML teams turn their AI solutions into products more quickly by reducing their need for engineers.
Like Tecton, Scribble Data wants to help smaller companies access these capabilities through its feature store. The startup targets firms with data science teams of up to 20 people, and annual revenue of less than $500 million, and has a base of customers in the e-commerce, edtech, FinTech, and healthtech verticals.
Within the feature engineering space, Ghoshal sees Scribble Data’s modular approach as a differentiator. “We’ve taken a modular approach to building a feature store where we understand that there are companies that have existing workflows and existing bits of technology, and they’re looking to streamline how they go from their existing systems into this machine learning world,” said Ghoshal.
For Scribble Data, privacy and compliance are also a key focus, and the startup plans to invest in adding and expanding its privacy and compliance features. Scribble Data’s also looking to build “more out of the box capabilities” for certain verticals like FinTech and strengthen its integrations with third-party data solutions.
The company’s broader product roadmap also includes a low-code consumption interface and additional apps that help bring data teams closer to anti-money laundering, benchmarking, personalization, and recommendation solutions.
Scribble Data currently has 11 employees, nine of whom are based in India, with two in Toronto, which has a large and fast-growing tech talent pool—the third biggest in North America behind San Francisco and New York, according to CBRE.
“In terms of talent being available in our specific niche … Toronto is a great hotbed for that,” said Ghoshal, who noted that the startup aims to add six or seven more employees in the city over the next 12 months.
Feature image courtesy Scribble Data.