Azure for AI/ML

mediumThis post was originally published by Allen Manoj at Medium [AI]

Bring the power of containerization and automation.

  • Visual drag and drop happens in this platform.

Data sources

Azure Blob Storage, Web URL using HTTP, Hadoop using HiveQL, Azure Table Storage, Azure SQL Database, SQL Server on AzureVM, On-premises SQL server database via the Data Manager and OData are the services provided by Microsoft Azure.

Data Formats

  • .csv — Comma-Separated Value with a header
  • .nh.csv — Comma-Separated Value with a no header
  • .tsv — Tab-Separated Values with a header
  • .nh.tsv — Tab-Separated Values with no header
  • .txt — Plain text
  • .svmlight SVMlight
  • .arff — Attribute Relation File Format
  • .zip
  • .RData — R object or workspace

Explore, Create Summaries

Things to keep in mind.

  • Which features show independent and independent behavior.
  • Do the features contain outliers.
  • Are there features that only add noise, if used for training the model.
  • Are there trends-patterns or biases.
  • Why the attributes have missing values.
  • Which are the values which are rare and why?
  • Can you see any unusual patterns? What might explain them?
  • How are the observations within each cluster similar to each other?
  • How are the observations within separate clusters different from each other?
  • Identify the missing values
  • Find the minimum and maximum value.
  • Correlation plot.
  • Box plot or identify the skewness or scatterplot.

Prepare and Clean Data

  • Replace using MICE.
  • Replace using Probabilistic PCA.
  • Custom substitution.
  • Replace with mean, mode, median.
  • Remove entire row, column.

Preprocessing

Using standard or advanced preprocessing automatically scaled or normalized to help the algorithm go well.

Cross-Validation

  • Uses more test data
  • Evaluates the dataset as well as the model.
  • Generalize to new datasets.

Model Deployment

Deploy the model for consumption!

  • Azure Container Instance
  • Azure Kubernetes Service
  • Azure IoT Edge
  • Field Programmable Gate Array
  • A configuration file requests the required resources for the container.
  • A score script file that tells the Automated ML’s to call the models.
Spread the word

This post was originally published by Allen Manoj at Medium [AI]

Related posts