This is an entity-centric approach to learning and relevancy that results in easy views of all relationships between a given set of entities and an unambiguous translation from metadata instructions to engine operations. The specifics of when to train models, run algorithms, and cache data are left to the ML engine to operate behind the scenes. Some metadata may be required to expose and fine-tune those operations over time, but for now this should be adequate to move forward, create a generic code structure, and implement our existing functionality as proof of concept.
This topic consists of below sub-topics:
- ML Service Metadata
- ML Model Metadata
- Entity Relation Metadata
- Relevancy Context Metadata
- Relevancy Query
- Relevancy Result Caching
ML Service Metadata
Machine Learning Services
This entity encapsulates information about an ML Service and the plugin we've written to deal with it.
Fields
- Name
- dotNetPlugInID - links to dotNet Plug Ins record containing plugin information
- Username for the service
- API key for the service
SubType: MachineLearningModelTypes
For example, Google Predictions support CLASSIFICATION or REGRESSION, and BigML lets you chose a single model, a bagged ensemble, or a random decision forest ensemble.
Fields
- Name
- Description
SubSubType: MachineLearningModelTypeInputParameters
Fields:
- Type (dropdown)
- Optional (boolean)
- Variadic (boolean)
- Description
ML Model Metadata
Machine Learning Models
This entity encapsulates the data we need in order to get a prediction from a service. That means it knows not only what the predictive endpoint is, but also what data makes up the model, what the feature being predicted is, and the type of model that's been (or needs to be) trained.
Fields
- Name
- Description
- EntityRelationID - links to an Entity Relation, which specifies the Origin and Target entities
- PredictedFeature - any field from the Views in the FeatureSets subtype
- MachineLearningServiceID
- MachineLearningModelType - any model type from the model types on the linked MachineLearningServiceID
- PMML - Predictive Model Markup Language describing the model – not all services provide this, but for those that do we should grab it; it will make moving a model between services possible. This is a read-only field since changing the PMML will not actually change the model being run – it's for archiving purposes.
SubType: MachineLearningModelFeatureSets
Outside services only need one feature set, but some of our own algorithms need more than one.
Fields:
- ViewID
- Description
I created a view type for these views to group them for ML. They currently require manual modification in custom SQL because Aptify does not support renaming columns in views and it is programmatically convenient to standardize the column names for feature sets.
SubType: MachineLearningModelInputParameters
Validation will ensure that every required input parameter of the selected Model Type is specified here.
Fields:
- MachineLearningModelTypeInputParameterID
- Value
Entity Relation Metadata
Entity Relations
This entity encapsulates an Origin and Target entity.
Fields:
- Name (computed: OriginEntityName -> TargetEntityName)
- OriginEntityName
- TargetEntityName
The Form Template should have an extra tab, Associated Models, with a view of all Machine Learning Models that have this Entity Relation.
Relevancy Context Metadata
Relevancy Contexts
This entity represents the context for which we want relevancy data. For example, Tab relevancy, Search relevancy, etc.
Fields:
- Name
- Description
Subtype: RelevancyContextRelations
A validation script or process flow will ensure that no two RelevancyContextRelations on the same Relevancy Context have the same origin and target; relevancy requests must be unambiguous!
Fields:
- MLModelID
- MLModelID_Name (virtual)
- MLModelID_OriginEntityName (virtual)
- MLModelID_TargetEntityName (virtual)
Relevancy Query
Relevancy Queries
This is the structure of the form data that must be sent to the Relevancy Service:
context: "contextname"
origintargetpairs: { {"originentityname": {comma-delimited-list of origin record IDs}, {"targetentityname": {comma-delimited-list of target record IDs}}, {etc.}, {etc.}
}
Relevancy Result Caching
ML Multi-Model Caches
Each origin->target entity pair has a distinct ML Multi-Model Cache, but all models with the same origin->target entity pair use the same ML Multi-Model Cache record.
Fields
- EntityRelationID - links to the origin->target entity pair
- EntityRelationID_Name (virtual)
SubType: RecordRelationCaches
Fields:
- OriginRecordID
- TargetRecordID
SubSubType: RecordRelationCacheWeights
Fields:
- MachineLearningModelID - restricted to models that have the same EntityRelation as this ML Multi-Model Cache
- Weight
This updated UML diagram represents the first functional build of the entities (minus the web MLService plugin subtype, oops!)
Comments
Please sign in to leave a comment.