ML and Relevancy Entity Structure

This is an entity-centric approach to learning and relevancy that results in easy views of all relationships between a given set of entities and an unambiguous translation from metadata instructions to engine operations. The specifics of when to train models, run algorithms, and cache data are left to the ML engine to operate behind the scenes. Some metadata may be required to expose and fine-tune those operations over time, but for now this should be adequate to move forward, create a generic code structure, and implement our existing functionality as proof of concept.

This topic consists of below sub-topics:

ML Service Metadata

Machine Learning Services

This entity encapsulates information about an ML Service and the plugin we've written to deal with it.

Fields

  • Name
  • dotNetPlugInID - links to dotNet Plug Ins record containing plugin information
  • Username for the service
  • API key for the service

 

SubType: MachineLearningModelTypes

For example, Google Predictions support CLASSIFICATION or REGRESSION, and BigML lets you chose a single model, a bagged ensemble, or a random decision forest ensemble.

Fields

  • Name
  • Description

SubSubType: MachineLearningModelTypeInputParameters

Fields:

  • Type (dropdown)
  • Optional (boolean)
  • Variadic (boolean)
  • Description

ML Model Metadata

Machine Learning Models

This entity encapsulates the data we need in order to get a prediction from a service. That means it knows not only what the predictive endpoint is, but also what data makes up the model, what the feature being predicted is, and the type of model that's been (or needs to be) trained.

Fields

  • Name
  • Description
  • EntityRelationID - links to an Entity Relation, which specifies the Origin and Target entities
  • PredictedFeature - any field from the Views in the FeatureSets subtype
  • MachineLearningServiceID
  • MachineLearningModelType - any model type from the model types on the linked MachineLearningServiceID
  • PMML - Predictive Model Markup Language describing the model – not all services provide this, but for those that do we should grab it; it will make moving a model between services possible. This is a read-only field since changing the PMML will not actually change the model being run – it's for archiving purposes.

 

SubType: MachineLearningModelFeatureSets

Outside services only need one feature set, but some of our own algorithms need more than one.

Fields:

  • ViewID
  • Description

I created a view type for these views to group them for ML. They currently require manual modification in custom SQL because Aptify does not support renaming columns in views and it is programmatically convenient to standardize the column names for feature sets.

 

SubType: MachineLearningModelInputParameters

Validation will ensure that every required input parameter of the selected Model Type is specified here.

Fields:

  • MachineLearningModelTypeInputParameterID
  • Value

Entity Relation Metadata

Entity Relations

This entity encapsulates an Origin and Target entity.

Fields:

  • Name (computed: OriginEntityName -> TargetEntityName)
  • OriginEntityName
  • TargetEntityName

The Form Template should have an extra tab, Associated Models, with a view of all Machine Learning Models that have this Entity Relation.

Relevancy Context Metadata

Relevancy Contexts

This entity represents the context for which we want relevancy data. For example, Tab relevancy, Search relevancy, etc.

Fields:

  • Name
  • Description

Subtype: RelevancyContextRelations

A validation script or process flow will ensure that no two RelevancyContextRelations on the same Relevancy Context have the same origin and target; relevancy requests must be unambiguous!

Fields:

  • MLModelID
  • MLModelID_Name (virtual)
  • MLModelID_OriginEntityName (virtual)
  • MLModelID_TargetEntityName (virtual)

 

Relevancy Query

Relevancy Queries

This is the structure of the form data that must be sent to the Relevancy Service:

context: "contextname"

origintargetpairs: { {"originentityname": {comma-delimited-list of origin record IDs}, {"targetentityname": {comma-delimited-list of target record IDs}}, {etc.}, {etc.}

}

 

Relevancy Result Caching

ML Multi-Model Caches

Each origin->target entity pair has a distinct ML Multi-Model Cache, but all models with the same origin->target entity pair use the same ML Multi-Model Cache record.

Fields

  • EntityRelationID - links to the origin->target entity pair
  • EntityRelationID_Name (virtual)

SubType: RecordRelationCaches

Fields:

  • OriginRecordID
  • TargetRecordID

SubSubType: RecordRelationCacheWeights

Fields:

  • MachineLearningModelID - restricted to models that have the same EntityRelation as this ML Multi-Model Cache
  • Weight

 

This updated UML diagram represents the first functional build of the entities (minus the web MLService plugin subtype, oops!)

UML Diagram for ML and Relevancy Entities.png

 

Was this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.