This post was originally published by @Marina T Alamanou PhD at Medium [AI]
“Every second of every day, our senses bring in way too much data than we can possibly process in our brains.” — Peter Diamandis, Chairman/CEO, X-Prize Foundation.
AI startups for Data Aggregation and Analysis during Drug Development
Even though public opinion and reputation of pharmaceutical companies appears to be eroding and a significant decline in the public’s perception of the transparency, openness and authenticity of drug makers “is in the air, everywhere I look around”, pharmaceutical companies do play a positive role in society. So, despite corruption and lack of transparency, the biggest problem of pharma is its conservative nature while dealing with a process — drug development — that simply doesn’t work anymore due to the lack of innovation, amid digital disruption, rapid technological advances and other issues such as lack of data reproducibility. Accordingly, “an army” of AI pharma startups is being set up to deal with pharma’s problems.
Let’s see now some of these data-driven startups, that employ ML and AI, for Data Aggregation and Analysis during Drug Development (petabytes of data).
AccutarBio (New York US, 2015), is a Shanghai and Brooklyn-based AI pharma company, that has raised millions of dollars in funding trying to use 3D projections of chemical structures to develop anti-cancer drugs.
The company employs AI for drug discovery — outperforming traditional drug pocket prediction, drug-target complex conformation prediction, drug-target binding affinity prediction and drug property (ADME) prediction — and offers so far:
- a data-driven atom-based scoring function trained with 100,000 protein crystal structures containing information of >100 million amino acid side chains,
- a dynamic deep neural network specifically designed for chemical informatics and
- a drug pocket side chain conformation prediction and drug docking.
AccutarBio researchers have a number of papers on the arXiv pre-print server and recently they received $15 million in funding (including money from Chinese AI/facial recognition company YITU) while they now partner with Amgen.
Ardigen (Kraków Poland, 2015) is a Polish bioinformatics company — part of the Selvita Group — trying to accelerate drug development by decoding microbiome, designing immunity and providing digital drug discovery services.
Ardigen’s neoantigen prediction platform called “ArdImmune Vax” employs AI to identify an optimal set of neoantigens as targets for cancer vaccines or adoptive cell therapies. This technology is also very well suited for the design of vaccines for infectious diseases.
On April 2018 Ardigen proved its excellence in AI by winning the 2nd place in the prestigious NCI-CPTAC DREAM-Proteogenomics Challenge (a community-based collaborative competition to answer key questions in cancer proteomics). Selvita S.A. — one of the largest preclinical contract research organisations in Europe— and Ardigen have signed on April 2020 a grant agreement to create the HiScAI Technology Platform (HiScAI — High Content Screening Artificial Intelligence), dedicated to the study of phenotypic changes in cells treated with a drug candidate, using ML and AI to analyse the data from high content screening.
Biorelate – Curating Truths with AI (Manchester UK, 2014) helps scientists solving THE MOST difficult biomedical challenges of today by curating truths from existing knowledge, enabling smarter and faster research and development.
Biorelate’s cognitive computing platform Galactic (Galactic-AI™) can speed up the research process by collecting and curating more than 30 million biomedical research text sources. With up to 80% of biomedical data thought to be unstructured, the platform helps researchers to generate a clearer view of the current state of research and gain invaluable insights. Dr Daniel Jamieson, CEO and founder of Biorelate announced on May 2020 that Biorelate has made its cloud-based web tool Galactic available to researchers to help drug discovery efforts, since lab access is restricted due to the global pandemic.
- a prediction platform for molecular mechanisms,
- in vivo barcoding,
- data set de-noising and
- identification of lead compounds from gene/diseases prediction.
Moreover, they offer a clinical insights engine that can be used to improve value based care initiatives that combined with the above four solutions can enrich pharmaceutical research and discovery efforts with clinical data.
On August 2020 BioSymetrics announced it has joined Accenture’s partner ecosystem — an integral part of Accenture’s INTIENT cloud-based R&D platform which has been designed to help life sciences organisations improve end-to-end productivity, efficiency and innovation from drug discovery through clinical and patient services.
Biotx.ai (Berlin Germany, 2017) is an AI tool for biomedical data which helps to reliably find complex patterns in high-dimensionality biomedical data. Biomedical data is difficult to analyse because of the problematic structure of small patient cohorts, sample sizes and many other factors, so by using their platform complex interactions can be found within the biomedical data and retrieve highly accurate predictive biomarkers.
Traditionally, big data analysis has involved millions of subjects and few features, that is easy to train AI with. But, with very small patients cohorts and sample sizes (amount of information gathered through clinical trials and genetic testing), with few subjects (few patients) and millions of features (an incredibly large number of data points, like the entire genome) is impossible to train AI with. This is something biotx.ai calls “wide data”. Wide data sets are incredibly hard to analyse, so most pharma avoid this approach entirely, opting instead to sequence entire populations.
So, biotx.ai has designed its AI algorithms to make wide data manageable by separating meaningful findings from the noise, leading to the discovery of previously untedectable complex genetic patterns, allowing for a more accurate 1) prediction of disease status and drug response (predictive biomarkers) and 2) patient stratification for clinical trials.
Causaly Inc (London UK, 2017) offers a semantic AI-platform machine which reads collections of scientific articles and extracts causal associations through linguistic and statistical models dealing with THE MOST difficult biomedical challenge: they increase productivity in literature reviews by filtering out false positives with their technology.
Every month something like 100,000 biomedical articles are added to the over 30+ million already published, which makes it almost impossible (apart time-consuming and inefficient) to try to decipher key relationships and find emerging discoveries in the vast data ocean of biomedical research. For this reason Causaly’s AI is reading and understanding biomedical literature similarly to how humans do, with Causaly having read everything ever written in biomedicine visualising relevant relationships within seconds. Causaly is working with several pharma and biotech companies, including Novartis, as well as with hospitals and academia.
Datavant, Inc (San Francisco US, 2017) employs AI for the clinical trial process, as well as organises and structures healthcare data to inform actionable insights for the design and interpretation of clinical trials (aggregates and analyses biomedical data through ML to lower the time, cost, and risk of drug development). Datavant specialises in breaking down silos and analysing health data securely and privately.
On July 2020, Medable Inc — the leading software provider for decentralised clinical trials — and Datavant announced a partnership that will help clinical trial teams easily integrate multiple data sources to accelerate decentralised trial design, recruitment and data management.
Deep Intelligent Pharma (DIP) (Beijing China, 2017) is a global start-up dedicated to empowering and accelerating drug discovery, development and registration through the most advanced AI technologies. With its end-to-end AI-driven platforms, the company enable clients to efficiently move compounds from the lab to post-marketing stage with great quality.
They offer the following AI solutions:
- knowledge graph, a biomedical research tool that brings knowledge to scientists in real time by connecting millions of knowledge nodes,
- drug discovery platform,
- organic synthesis system,
- clinical data platform,
- regulatory management platform,
- top-quality medical translation services and many more.
They operate across China, US and Japan. DIP has raised $26.1 million from Sequoia Capital China and ZhenFund. Among its partners you can find Roche, GSK and Bayer.
Data4Cure’s (California US, 2013) Biomedical Intelligence Cloud platform and services help pharmas make more informed decisions using bioinformatics, ML and AI applications built on top of the largest repository of semantically linked biomedical data and literature. By inferring and organising knowledge from thousands of genomic, phenotypic and clinical datasets, they allow researchers to identify new targets and biomarkers, repurpose drugs and identify disease pathways.
Central to the platform is a dynamic biomedical knowledge graph called CURIE™ spanning over 1 billion biomedical facts and relations continuously inferred from thousands of datasets (both public and customer-specific) and millions of publications.
Data2Discovery (Utah US, 2012) uses AI to find hidden connections and new insights in diverse, linked datasets by allowing researchers to understand and treat diseases by connecting data in new ways.
Genialis (Texas US, 2011) uses AI to analyse multi-omics next-generation sequencing data allowing researchers to reveal previously unseen patterns across large, heterogeneous datasets to predict targets and biomarkers. Over the course of the year, Genialis initiated collaborations with numerous cutting edge biopharma, including Checkmate Pharmaceuticals and Oncologie, as well as renowned biotechnology leader Thermo Fisher Scientific.
With partners like Checkmate and Oncologie, as well as others, the goal has been to leverage RNA sequencing and clinical trial outcomes data to model gene signatures that stratify patients based on predicted drug response. Moreover, Genialis and Thermo Fisher Scientific struck up a collaboration to provide the scientific community with a comprehensive set of tools for RNA sequencing.
In 2019 the Alliance for AI in Healthcare (AAIH) was launched — the first industry advocacy organization dedicated to promoting responsible adoption and use of AI to improve healthcare outcomes — and Genialis was a founding member of the AAIH.
Helix (Georgia US, 2017) uses AI to respond to verbal questions and requests in a lab setting. In this way allows researchers to increase efficiency, improve lab safety, keep current on relevant new research and manage inventory. HelixAI is participating in the Amazon Alexa accelerator.
Evid Science (California US, 2017) trusted by the largest institutions in the world, puts 70M+ evidence-based data points at your fingertips, all backed by the literature. Evid Science’s patented AI, which they claim can read up to 25 million articles in an hour, has already processed the publicly available medical literature across all endpoints, interventions and therapy areas and updates nightly, reducing weeks (or even months) of work to a few clicks and enabling customers to make faster, smarter, evidence-based decisions. Evid Science’s researchers have a number of papers.
Iris.ai (Oslo Norway, 2015) utilises AI mapping out existing knowledge from published research, patents and internal R&D content. Moving beyond limiting keywords, endless result lists and the biased citation, Iris.ai is the perfect AI assistant for cross-disciplinary early stage research projects, that allows to establish and find the similarity of document “fingerprints” based on a combination of keyword extraction, word embeddings, neural topic modeling and other natural language understanding techniques.
Intelligencia’s (New York US, 2017) iNsight a proprietary data cube, integrates structured and unstructured data from a host of data sources, to assess the probability of technical and regulatory success of an asset (drug) at any stage of clinical development, across Phases 1–3. With clinical development having a success rate of ~10% they utilise ML models to assess the probability of technical and regulatory success of a drug at any stage of clinical development. Further, they interpret the reasons behind their estimates and provide insights into the drivers (positive or negative) of the probability of technical and regulatory success.
Intellegens’s (Cambridge UK, 2017) — a spin-out from the University of Cambridge — first commercial product Alchemite utilises AI to learn underlying correlations in fragmented datasets with incomplete information, allowing researchers to estimate missing knowledge of how candidate drugs act on proteins.
The Alchemite™ platform is based on cutting-edge deep learning algorithms that can see correlations between all available parameters, both inputs and outputs, in fragmented, unstructured, corrupt or noisy datasets that are as little as 0.05% complete. The generated models can predict missing values, find errors and optimise target properties with greater levels of accuracy than traditional approaches where complete data is needed. Intellegens’ researchers have a number of papers and Intellegens is one of 10 startups selected for the ATI Boeing Accelerator to boost UK innovation.
Innoplexus (Frankfurt Germany, 2011) is a consulting-led technology and product development company focusing on big data and analytics, using AI to generate insights from billions of disparate data points from thousands of data sources. In this way they allowing researchers to improve decision-making by seeing information in context from biomedical data sources including publications, clinical trials, congresses and theses.
Innoplexus — with over 250 employees and 120+ patent filings including 14 grants in AI, ML and blockchain technologies — can generate real-time insights from hundreds of terabytes of structured and unstructured private and public data, thereby facilitating continuous, informed decision-making for its customer base at an unprecedented speed. On April 14, 2020, Northern Data AG — providers of high-performance computing solutions — announced a strategic partnership with Innoplexus AG to accelerate drug discovery and development against COVID-19 and other diseases.
InveniAI’s (Connecticut US) provides AI-driven technology solutions for accelerating innovation crucial to decision making, growth and success of an organization, by serving a variety of industries, including healthcare (biopharmaceuticals, consumer healthcare, animal therapeutics, vaccines), food/nutrition, agri-tech, aviation, chemicals and government.
They have created a compelling technology platform, AlphaMeld®, that has been central to creating significant clinical, transactional and strategic impact on over 150 global collaborations. They have also a therapeutic drug pipeline that is now being developed by their sister company, BioXcel Therapeutics (a developing high value therapeutics in neuroscience and immuno-oncology using AI). On July 2020 — InveniAI announced a strategic collaboration with GlaxoSmithKline Consumer Healthcare to leverage AlphaMeld®.
Insitro (California US, 2018) a top AI startup integrates ML techniques for drug development. It offers life sciences, engineering and data science to define problems, design experiments, analyse the data and derive insights that develops new therapeutics.
They use an integrated model of disease spanning in vitro cellular systems and in silico machine learning models, creating the insitro model, to discover previously unseen disease subtypes and search for interventions that move them from an “unhealthy” to a “healthy” state. The company partnered with Gilead Sciences to find medicines to treat a liver disease called nonalcoholic steatohepatitis (NASH) because of all the related human data that Gilead has amassed over time. On May 2020 it was announced that it has raised $143 million in an oversubscribed Series B financing.
Quertle’s (Nevada US, 2008) flagship product Qinsight™ enables unparalleled discovery of literature through AI-powered searching, integration, organization, and presentation including predictive visual analytics. Qinsight, which covers journal articles, patents, clinical trials, treatment protocols and much more, is in use by pharmaceutical and biotechnology companies, universities, research centers, and healthcare providers around the world.
Linguamatics (Cambridge UK, 2001), an IQVIA company, utilises AI to extract and analyse text. Linguamatics is a software company providing high performance natural language processing based text mining software. The software enables the rapid extraction of business critical facts and relationships from large document collections. Linguamatics’ text mining software can be used for drug discovery and basic research, patents analytics, drug safety and pharmacovigilance, precision medicine, voice of the customer, real world data, clinical trial analytics, regulatory compliance, clinical research and many more.
LabTwin (Berlin Germany, 2018) is the world’s first voice and AI-powered digital lab assistant, working alongside scientists at the point of experimentation. They utilise AI to understand voice-based commands and transcribe voice-based notes, allowing researchers to take notes and organise lab documentation faster and with less effort. On October 2019 it was announced a new partnership between LabTwin and ABI-LAB, a life science incubator and accelerator that supports biotech, medtech and medical data startups. On June 2020 LabTwin, has been named a Gartner Cool Vendor in Life Sciences. Gartner, is the world’s leading research and advisory company that recognises interesting, new and innovative vendors, products and services.
Owkin’s (New York, US, 2016) mission is to use ML to develop better drugs for patients. They offer:
- Owkin Loop that connects medical researchers with high-quality datasets from leading academic research centers around the world and powered by the two main components of Owkin’s Software Stack: Owkin Studio and Owkin Connect.
- Owkin Studio designed for medical researchers to apply AI to their research cohorts and scientific questions.
- Owkin Connect enables the secure and federated training and validation of AI techniques over the Loop. And
- Owkin Lab an in-house award-winning interdisciplinary team of data and biomedical scientists, that collaborate with partners to train AI models and design bespoke, end-to-end research solutions to meet their needs.
Owkin’s researchers have a number of papers and in their paper on Nature communications (August 2020) they presented a deep learning model to predict RNA-Seq expression of tumours from whole slide images. The test is called HE2RNA).
Plex Research’s (Boston US, 2017) unique AI search engine technology combines breadth, depth and transparency to give you precise answers to the most difficult scientific questions. It ties together large and disparate data sources, and enables scientists to make powerful and actionable insights into the world’s scientific information, all from a single search bar. Plex impacts all stages of the drug development pipeline and they offer:
- Plex Professional, a search engine that connects all types of scientific data (broad array of public sources and databases). And
- Plex Enterprise, that allows organisations to make the most use of their internal scientific data by connecting organisation’s own proprietary data and algorithms with the public data available in Plex Professional.
PatSnap (Singapore, 2007) has brought together the world’s most comprehensive R&D dataset in one easy to use platform to help innovation leaders analyse tech trends, assess new opportunities, conduct competitor intelligence and maximise return on IP assets. By combining millions of data points from patents, licensing, litigation and company information with non-patent literature — they analyse over 114 million chemical structures, clinical trial information, regulatory details, toxicity data and over 121 million patents and other sources — providing the world’s most innovative organisations with a new intuitive source of information to accelerate their R&D.
Percayai (Missouri, US, 2018) uses AI to organize and prioritise data in a contextual manner, enabling interactive 3D diagrams illustrating biological information, allowing researchers to rapidly generate testable hypotheses from complex, omic, and multi-omic data sets.
SciNote (Wisconsin US, 2015) is a top-rated platform by 70k+ scientists in 100+ countries in academia or industry. They offer efficient digital lab management and all experimental data in one place: from note-keeping to inventory management, reporting and CFR 21 Part 11. In SciNote all your data is searchable, accessible and traceable.
Sparrho was founded in 2013 out of frustration with existing literature search tools by two Oxbridge scientists, and now has an amazing team based in London. They use AI to curate — in combination with human expertise — millions of scientific papers from thousands of publications, allowing researchers to stay up-to-date with new scientific publications and patents.
They have 60M+ research articles/patents aggregated, they have indexed 50K+ unique data sources and they have 18K+ content curators (scientists, researchers, PhDs, teachers) across the globe. Sparrho’s platform hosts 155k+ curated collections and three-min digests, and their content is enhanced by world-class researchers (130k+ scientists) from 1,500+ universities. They are planning to to carry forward their success across Asian and European markets and cater to the Top-50 Pharma companies of the world.
ThoughtSpot (California US, 2012) is a business intelligence platform that helps you explore, analyse and share real-time business analytics data easily. ThoughtSpot’s AI-Driven analytics platform puts the power of a thousand analysts in every business person’s hands. They enable natural language search on billions of rows of data from any source, allowing researchers to speed analysis of clinical trial results and historical genomics data.
ThoughtSpot announced ThoughtSpot 6.2, an update to its search and AI-driven analytics platform, that includes new exploration, collaboration and visualisation capabilities, to help organizations unlock value from their data in record time. With new features like DataFlow, Embrace for SAP and Teradata and the ThoughtSpot bulkloader API, enterprises have more flexibility and choice on how they leverage their data for search and AI-driven analytics, wherever that data originates.
Nference, Inc (Massachusetts US, 2013) offers AI software platform that helps researchers extract knowledge in real-time from commercial, scientific and regulatory literature, allowing researchers to identify competitive white space, eliminate blind spots in research and discover disease similarities by phenotype for clinical trial design. The platform enables a diverse set of applications ranging from R&D to commercial strategy and operations in the life sciences ecosystem.
On January 2020 it was announced that Nference closed a $60 million Series B Financing to advance its augmented intelligence platform for clinical research and therapeutic development.
Kyndi’s AI (California US, 2014) platform uses ML to streamline regulated business processes and offer auditable AI systems for government, financial services and healthcare. Kyndi enables enterprises and government to transform regulated processes by offering auditable AI systems. The platform is comprised of the following AI engines and tools:
- Kyndi’s Discovery Engine,
- Kyndi’s Relevance Engine,
- Kyndi’s Explanation Engine,
- Kyndi’s Lexicon Engines.
On July 2019, Kyndi announced that it raised $20 million in Series B funding led by Intel Capital, with participation from UL Ventures, PivotNorth Capital and existing investors.
Molecular Health (Heidelberg Germany, 2004) utilises AI to analyse molecular and clinical data of individual patients allowing researchers to improve prediction of drug response and resistance, to design more successful trials and to use molecular evidence for market acceptance. The cloud-based MOLECULAR HEALTH DATAOME platform they offer analyses the molecular and clinical data of individual patients against the world’s medical, biological, and pharmacological knowledge, to drive more precise diagnostic, therapeutic and drug safety decisions.
On July 2020, Centogene — a commercial-stage company focused on rare diseases that transforms real-world clinical and genetic data into actionable information — and Molecular Health announced that they will collaborate exclusively to initiate the Real-life data and Innovative Bioinformatic Algorithms (RIBA) project. RIBA aims to foster a unique novel precision medicine environment to accelerate, de-risk, and improve the development of new orphan drugs based on the combination of large real-life data sets in rare disease with innovative big data, innovative AI, as well as computational algorithms and expertise.
OneThree Biotech (New York US, 2018) utilises AI to integrate and analyse data from over 30 types of chemical, biological and clinical data allowing researchers to generate new insights across the drug development pipeline, including: Target Discovery, Lead Identification, ADME/Toxicity and Therapeutic Positioning and Biomarkers. This 2-year-old company has just started offering free toxicity screening on potential treatments for COVID-19 and also announced it has closed a $2.5 million seed round, with Primary Venture Partners and Meridian Street Capital as lead investors. OneThree Biotech’s researchers have a number of papers on the arXiv pre-print server, Cell, Nature and Plos.
StoneWise (Beijing China, 2018) utilises AI to enable knowledge mining, molecule generation, and property prediction allowing researchers to build knowledge graphs of scientific literature, predict molecular properties, design novel molecules, and perform retro-synthetic analysis. They offer:
- the “me-too drugs” platform with well-curated knowledge graphs and state-of-art algorithms for de novo molecule design,
- the “best-in-class” platform that can perform intelligent search among the space of all synthetically accessible compounds, whose size is estimated to be 10⁶⁰ and
- the “first-in-class drug” platform providing technologies to accelerate the discovery of first-in-class drugs, including methods to find novel targets, as well as methods to find effective combinations of well-established targets.
This year they raised nearly $10 million in Series A funding, to quickly discover novel treatments for new and endemic infections.
Wisecube AI (Washington US, 2016) utilises AI to analyse internal and external datasets allowing researchers to rank small molecules for drug repurposing and optimisation of clinical studies. Wisecube utilises a multi-stage process — Managed Datasets, Collaborative workspaces, Visual Workflows, Easy Deployments, Centralised Catalog, Integrated community— called Safe AI Development Lifecycle that ensures continuous model development and monitoring.
This post was originally published by @Marina T Alamanou PhD at Medium [AI]