The Machine Learning Handbook

Madhuri Dange

April 23, 2024 09:55

These documents are intended to serve not only as technical documentation for the Aptify Machine Learning Engine (AMLE), but also as a primer for the concepts on which it is based. AMLE is a tool that will require input and cooperation from all parts of the organization in order to be utilized to its fullest. Think of it as another item in our toolbox, able to make anyone's job a little easier when applied creatively. It's not a magic box, and its core concepts are within the grasp of any individual, regardless of your background (or lack thereof) in computing, so I encourage all members of the Aptify family to at least skim through.

Core Concepts
High-Level Structure
Metadata Structure and Usage
Code Structure and Usage
- Machine Learning Service Plugins
- Relevancy Engine Classes
Ideas Going Forward

Core Concepts

Machine learning is one of many elements of data science. It is fundamentally a tool to help find and quantify relationships between different pieces of data. Consider if I I asked you to suggest five people I should try to befriend. You would probably think about my interests and who I am already friends with, how much time I spend with those people, and what our common interests are. From there you might look at who my friends' friends are and find the five of them who are most similar to the friends I spend the most time with and who have the most similar interests to me. From this description you can probably guess that there is a systematic method of sorting through that information and calculating who I am most likely to want to befriend, and if you guessed this is what Facebook does then you'd be exactly right!

That brings us to our first core concept: machine learning consists of systematic methods (algorithms) to find implied relationships (correlations) in data. Whether or not these correlations are correct in any useful sense is another matter. I could probably come up with a pretty convincing chart showing that computer usage causes lower-back pain and postural problems, but that doesn't change the fact that sitting all day is the real problem. Conversely, I could show you a graph suggesting that sitting causes carpal tunnel syndrome, rather than the keyboard use that actually causes it.

Core concept number two is therefore that machine learning should be treated with skepticism. If you can't think of a good reason why the machine is telling you two things are related, chances are they're not. The carpal tunnel example shows that, while certain elements may be correlated, one may not actually be the cause of the other. If we attempted to extrapolate the carpal tunnel implication to television viewers, we would be sorely disappointed by the results of our prediction; likewise, I don't hear any complaints of back pain from those using their computers at a standing desk here in the NOLA office! Illusory correlation is a concept that is familiar for anyone who has studied statistics.

In fact, our third concept is that machine learning is applied statistics, and its results are therefore only as good as the data you give it. Facebook friend recommendations are a good example of this. Friends are recommended based on correlations between their interests and your interests, as well as your common friends. This data is narrow and uses information from you in order to provide good results. Suggested friends would not be very relevant at all if they were based on all users' interests, or friends of Kevin Bacon, but because you told Facebook what you like and who you like, it's able to narrow its algorithms' focus to that data and connect you with people you may know or like.

This brings us to the fourth and most important concept: machine learning exists to help you make your life easier, but you have to help it first. Learning is based on experience. You can drive a car while holding a conversation, not because driving a car or holding a conversation are easy tasks, but because you've done them so many times that you don't have to think about what to do. The behavior is hard-wired thanks to changes that accumulated over years of experience. You didn't learn to do those things on your own though. Your parents probably taught you to do both of those things, and a lot of other things. Based on their feedback, you learned what was correct and incorrect and adjusted your behavior to match.

A machine learning algorithm has to be trained as well. Data is given to the machine as input and it attempts to predict what its output should be based on previous examples. If you label the data it will attempt to match its output to your labels (supervised learning), but if you don't then it will just categorize the data (unsupervised learning). Some algorithms don't quite work like this and are actually a lot less flexible, and unlike a human the computer does not have intrinsic motivation to do anything other than what you instruct it to do.

For example, if you want the machine to make predictions about who will buy your product, first you have to tell the machine who has bought your product, some information about those people, and more importantly, who has not bought your product. By providing both positive and negative examples you can create (or "train") a model that classifies information much more accurately than one that has access only to positive (or only to negative) examples. Once it's trained it can also tell you what people who buy or don't buy your product are most likely to have in common. If you already know what makes people more likely to buy or not buy your product (maybe because they told you or you found a couple key factors in a model), then you can skip the complex learning model and use one of the static (non-learning) algorithms to efficiently map that relationship for you, or even use those in combination with a trained model. This guide will make this understandable and easy for you (let us know if you have questions it doesn't answer!). The following sections describe what the Aptify Machine Learning Engine is, how to use it, examples of its usage, and some features I would like to add.

High-Level Structure

The Aptify Machine Learning Engine (AMLE) is a generic engine that allows any type of machine learning (ML) algorithm, from any service provider, to interact with data, other ML algorithms, and the user, in a uniform manner. The key concept of its operation is the data structure it uses to represent data relationships, which is a graph.

For those unfamiliar with graph theory, the term graph here represents a special type of mathematical object, not a method of presenting data (although drawing the structure of a graph is itself a useful presentation of data!). A graph consists of two basic components: nodes (vertices) and relationships (edges). Nodes are usually used to represent distinct entities, such as people, and relationships usually represent abstract connections between nodes, like friendship, although in the case of representing a power grid or water distribution system the relationships may represent real objects like power lines or pipes. (In formal graph theory, the formulae representing graphs and their operations are usually expressed in terms of V and E for vertices and edges, but we'll use the terms node and relationship because they are more conversational and data centric.)

In AMLE, every Machine Learning Model has just a few properties that you need to worry about. These are feature sets, input parameters, and the output graph. Every model has exactly one graph associated with it but may have multiple feature sets and input parameters. Simple models tend to use only a single feature set, but trainable models and models that use other models to find higher-order relationships can have multiple feature sets. The most technical aspect of configuring models is choosing their input parameters because these configure the algorithm itself and alter its behavior.

Feature Sets

Feature sets are the data from which you want to compute relationships. A typical feature set's columns consist of two record IDs and a piece of data, such as a date, time, or numeric value, that represents some sort of interaction between the two records. Trainable models also take in a lot of information about each record, like a person's address and age and a product's cost and category, in order to make predictions based on the properties of those entities. We currently represent Feature Sets as DB Objects containing SQL Views in Aptify.

Input Parameters

Input parameters vary widely depending on the Machine Learning Model Type. Some examples are:

bounds on dates and times to use from feature sets that use those, for instance computing only on records from the last 30 days
coefficients that alter the results of calculations within algorithms

Choosing input parameters often requires a strong background in computing, math, statistics, or data science. Documentation and experience with each algorithm make it easier to recommend default values for certain situations.

Output Graph

The output of any model is a graph representing a relationship between two Entities. This means that all machine learning operations are entity centric. The first thing that must be done before creating a model is defining which two entities will be used, such as Persons and Products if you want to create a Persons to Products relationship. The resulting data structure uses the terminology of Origin Entity to represent Persons, Target Entity to represent Products, Origin ID to represent a single Persons record, and Target ID to represent a single Products record. Origins (Origin IDs) and targets (Target IDs) appear in pairs, with each pair having either a weight, label, or both associated with them representing how they are related, such as "Has Bought." The terms origin and target lend themselves to directed graphs in which the origin is said to point to the target, but AMLE is also configurable to produce undirected relationships for models and data that support it. A directed relationship implies some sort of causality or asymmetry, while an undirected relationship represents a connection that is symmetrical or in which cause is inapplicable or unknown.

Example:

One of the stock models creates a Persons to Form Template Parts relationship. The feature set used for this is data gathered from the web client about which tab within an entity's menu the person has interacted with, for instance the Plug-Ins menu on Entities. This feature set consists of the columns PersonID, TabID (Form Template Part ID), TimeStart, and TimeEnd, representing the current user, the current tab menu of an entity record, the time the person opened the tab, and the time the person left the tab. The number of times the person viewed the tab for more than a few seconds in the last 90 days is tabulated, and this count is assigned as the weight from Person to Form Template Part for that person and tab. We use this in the web client to sort an entity record's tabs, for each entity, by most-used tab, and create shortcuts to the user's most-used tabs next to the dropdown menu on the record.

The next section will go in-depth on the structure of AMLE's metadata and how to configure it.

Metadata Structure and Usage

The UML diagram of AMLE's entities will be your best reference for this section:

Machine Learning Services

A Machine Learning service is a local or remote service that can make predictions based on models. (Many ML services can also generate models from a dataset, but AMLE does not currently support any automated model generation). The Machine Learning Services represent these services in Aptify, complete with a plugin that deals with service-specific

Machine Learning Service Model Types

A Machine Learning service supports different types of models. For example, BigML supports single models along with bagged and random decision forest ensembles, while Google Predictions supports classification and regression model types. The Machine Learning Service Model Types define model types on a per-service basis and provides a template against which a Machine Learning Model using that model type is validated.

Machine Learning Entity Relations

A Machine Learning model is abstractly represented in AMLE as producing a weight or label edge that links an origin record node to a target record node. (See a discussion of this 'graph' representation here.) The Machine Learning Entity Relations define Origin / Target entity pairs.

Machine Learning Models

A Machine Learning model is an algorithm that, given some data, produces a prediction based on that data. For example, given a Person and data about that Persons past purchases, it might predict how likely that Person is to buy a Product. The Machine Learning Models entity defines metadata around these models, specifying which Machine Learning Service it uses, which Machine Learning Service Model Type it instantiates, and how to configure that Model Type for this particular model.

Input Parameters are values used to configure the behavior of the model. Their allowable configurations are defined and constrained by the Model Type Input Parameters for the chosen Model Type. Some knowledge of the operation of the algorithm is necessary to configure this metadata and is the most likely area where you will need assistance from a data scientist, but detailed recommendations will be developed over time as we gain experience with the algorithms, making this configuration more approachable to less technical users.

Feature Sets define the input data for the model. Their properties are defined and constrained by the Model Type Feature Sets for the chosen Model Type. Model Feature Sets link to views defined in Database Objects. The views must meet the data-type and aliasing requirements of the Model Type's Feature Sets. Plural case is used throughout here because some Model Types accept or require multiple feature sets.

Feature Set Column Mappings define the mapping between the column names as they appear in the Feature Set and as they are programmatically referenced inside the model itself. Only Services and Model Types where these names differ require the Column Mapping; the default assumption is that the model references the columns by their Feature Set names.

Labels define the set of possible labels for labeled models; they are ignored for weighted models.

Once a model is created it will generate data, but that data will not be accessible until a Relevancy Context is defined to access it.

Relevancy Contexts

Relevancy Contexts are used to decouple public data access from the details of the models behind them. A context is defined by a unique Name, and Machine Learning Models associated with it through Relevancy Context ML Models. A context may have any number of associated models, but only one model per Machine Learning Entity Relation may be active at once. By decoupling public access from the actual models, applications utilizing relations may be given their own context(s) and retrieve data by knowing only their context and the two entities to be related. This allows modifications to the way the relationship data is generated by changing what model is used without requiring any changes to applications leveraging the relationship data.

Relevancy Context Categories

Relevancy Context Categories are used to organize Relevancy Contexts. They have no meaning to the backend system.

Multi-ML Model Caches

The relationship data that models output is stored in a multi-level cache oriented around specific Machine Learning Entity Relations. A Multi-ML Model Cache is create for each Machine Learning Entity Relation that is being used by a Machine Learning Model. Each record ID pair that a relationship is generated for is associated with an ML Entity Relation Cache that links to the records' entities through the Multi-ML Model Cache. Each model that produces a relationship for any particular record ID pair produces an ML Entity Relation Weight Cache record and/or an ML Entity Relation Label Cache record that links to the Model and the associated records' ML Entity Relation Cache. Data duplication is avoided through this scheme, and the weights from all models that a record pair are associated through are contained in the same table. The caches are machine-generated and should be treated as read-only; modification of values in the cache may cause unexpected behavior if any models use cached output data from another model as a feature set. All public access to relationship data queries the cache in order to leverage row-set security.

Code Structure and Usage

The classes for the Aptify Machine Learning Engine are broken down into two categories: Machine Learning Engine, containing Plugin and Process Flow code, and Relevancy Engine. Machine Learning Engine classes do the heavy number crunching and translate from metadata to service-specific operations, while Relevancy Engine classes translate public relevancy requests (Origin Entity, Target Entity, Context) into Machine Learning Engine instructions and cache queries.

Machine Learning Service Plugins
Relevancy Engine Classes

Machine Learning Service Plugins

The Machine Learning Engine Service Plugins are based on an abstract class MachineLearningServicePluginBase. The methods implemented in the base class take care of parsing metadata into dictionaries and updating the output cache. A Run method is supplied that automates all of this, leaving plugins to implement a single overloaded Run method. The metadata is split into three dictionaries. The first dictionary contains all of the model metadata, including which algorithm will run. That dictionary also contains two more dictionaries of feature sets and parameters.

public void Run(UserCredentials uc, long modelID, long[] originIDs = null, long[] targetIDs = null)
{
    if (modelID > 0)
    {
        DataSet metadata = getMetadata(uc, modelID);
        Dictionary<string, Object> parameters = parseParameters(metadata, originIDs, targetIDs);
        //parameter metadata is validated when the record is saved, so no validation needed here or in plugins*
        if (parameters != null)
        {   //DataTable format: OriginRecordID, TargetRecordID, Weight
            DataTable updatedRelationships = this.Run(uc, parameters);
            if (updatedRelationships != null && updatedRelationships.Rows.Count > 0)
            {
	            updateCache(uc, modelID, updatedRelationships);
            }
        }
    }
}
 
private DataSet getMetadata(UserCredentials uc, long modelID)
{
    DataSet metadata = new DataSet();
    DataAction da = new DataAction(uc);
    string query = "SELECT * FROM vwMLModelMetadata WHERE ModelID = " + modelID;
    DataTable modelData = da.GetDataTable(query);
    modelData.TableName = "ModelData";
    metadata.Tables.Add(modelData);
    query = "SELECT vwMLModelFeatureSets.MLModelTypeFeatureSetID_Name AS Name, vwDBObjects.Name AS FeatureSetView ";
    query += "FROM vwMLModelFeatureSets JOIN vwDBObjects ON DBObjectID = vwDBObjects.ID ";
    query += "WHERE MachineLearningModelID = " + modelID;
    DataTable featureSets = da.GetDataTable(query);
    featureSets.TableName = "FeatureSets";
    metadata.Tables.Add(featureSets);
    query = "SELECT * FROM vwMLModelParameters WHERE ModelID = " + modelID;
    DataTable parameters = da.GetDataTable(query);
    parameters.TableName = "InputParameters";
    metadata.Tables.Add(parameters);
    return metadata;
}

From here it's very easy to implement a plugin class because all of the data will be in a consistent format that is easy to read and write code for. The Aptify ML plugin shows how easy it is to route the information to the correct algorithm.

public class AptifyMachineLearningService : MachineLearningServicePluginBase
{
    protected override DataTable Run(UserCredentials uc, Dictionary<string, Object> parameters)
    {
        DataTable result = null;
        Dictionary<string, Object> featureSetViews = (Dictionary<string, Object>)parameters["FeatureSets"],
            inputParameters = (Dictionary<string, Object>)parameters["InputParameters"];
        switch ((string)parameters["ModelType"])
        {
            case "InverseTimeElapsed":
                result = InverseTimeElapsed.compute(uc, featureSetViews, inputParameters);
                break;
            case "CountDurationsByThreshold":
                result = CountDurationsByThreshold.compute(uc, featureSetViews, inputParameters);
                break;
            case "DurationProportionsByThreshold":
                result = DurationProportionsByThreshold.compute(uc, featureSetViews, inputParameters);
                break;
            default:
                break;
        }
        return result;
    }
}

From here the algorithms can reference all of their feature sets and parameters by name in the dictionaries. There is also a utility provided to filter feature sets by selecting the data from them with a where clause applied to limit them to specific OriginIDs and/or TargetIDs. The filter can also ensure that the features are distinct if needed.

public static DataTable compute(UserCredentials uc, Dictionary<string, Object> featureSetViews, Dictionary<string, Object> inputParameters)
{
    DataTable filteredFeatureSet = FeatureSets.filterFeatureSet(uc, "DurationRelation", featureSetViews, inputParameters, true);
...
}
 
public static DataTable filterFeatureSet(UserCredentials uc, string viewName, Dictionary<string, Object> featureSetViews, Dictionary<string, Object> inputParameters, bool distinct = false, bool origins = true, bool targets = true);

If for some reason a model produced bad relationship data and you need to do a full refresh of its cache, the easiest way right now is to execute "DELETE FROM MLEntityRelationWeightCache WHERE ModelID = <your model ID>" Since the addition of Required Model functionality (models that a model depends on), there is the side effect of the required models not being updated until the cache timestamp has expired when manually clearing the output cache with these methods, as the timestamp is contained on the model record. The timestamp can be manually nulled on the model record, or each required model can be manually run, as the cache policy is only checked when the model is programmatically run using the Required Model dependency metadata. A good addition would be a process flow to clear the cache that also nulls the LastUpdate timestamp on each model whose cache was cleared.

Relevancy Engine Classes

The Relevancy Engine is comprised of three classes: Machine Learning Frontend, Relevancy Request Handler, and Relevancy Controller. The Machine Learning Frontend provides all of the functionality to service a request in .NET, while the Relevancy Request Handler translates to and from JSON to enable handling of web requests. The Relevancy Controller provides the service endpoint for web requests and passes data to and from the Relevancy Request Handler.

The Machine Learning Frontend enables access to the Multi-ML Model Caches, as well as requests for cache updates. Requests to it are highly configurable. While we do not currently support labeled relationships in the code, all other parameters can be specified, including which columns to return, limiting results to a list of origin and/or target IDs, limiting to a certain number of results, only returning results with weight greater or less than a specified threshold, which direction to order by, which column to order by, how long it is acceptable to wait for a cache update to occur before returning if a cache miss occurs, and how often to poll between cache checks if an update is being waited on. You do not need to check for SQL injection when providing input to the Machine Learning Frontend because it implements its own. If the requested relationship exists but is not cached then a process flow run will be created to run the model behind the relationship. If you want to wait for this task to complete you can specify a relatively long maxWait (in milliseconds) to see if it completes. In my testing it appears that anything more than 30s causes a SQL connection timeout (the wait is achieved in a sleep loop on the server to avoid having to call the server multiple times), but this may be different depending on how each server is configured. The polling interval is limited to 1/10th the acceptable waiting period. Keep in mind that the waiting period does not account for delay between your application and the Frontend or for delay between the Frontend and the SQL server.

public static class MachineLearningFrontend
{ 
    private const string defaultOrdering = "DESC", defaultOrderingColumn = "Weight";
    private static string[] validColumns = {"OriginRecordID", "TargetRecordID", "Weight"};
    public static DataTable getRelationship(UserCredentials uc, string originEntityName,
                                            string targetEntityName, string contextName,
                                            LinkedList<string> columns,
                                            long[] originIDs = null, long[] targetIDs = null,
                                            long resultLimit = 0, double threshold = 0,
                                            string thresholdType = "Greater", string ordering = defaultOrdering,
                                            string orderByColumn = defaultOrderingColumn,
                                            int maxWait = 50, int pollInterval = 5)
    {
...

The Relevancy Request Handler is a JSON wrapper for the Machine Learning Frontend. Multiple requests can be batched as an array of request objects and single requests must be wrapped in an array. Requests are completed concurrently, so batching should be more performant than sequential requests. Data will be returned when all processing is completed for the batch, so if you specify a long maxWait on one request among the batch, the entire batch may take that long to return. There is a special dev mode that enables full error checking of the request, but should be disabled once the application hitting the endpoint has been debugged due to the overhead of error checking. The following example illustrates how to retrieve relations from the cache using a JSON request.

var request = new Object([
                        {
                            "Context": "UI",
                            "OriginEntity": "Persons",
                            "OriginIDs": [
                                Number(Aptify.framework.utility.User.getCurrentUserRelatedEntityRecordId("Persons"))
                            ],
                            "TargetEntity": "Form Template Parts",
                            "TargetIDs": [],
                            "Ordering": "DESC",
                            "Columns": [
                                "TargetRecordID",
                                "Weight"
                            ]
                        }
                    ]);
                    var i;
                    for (i = 0; i < allTabInfo.length; i++) {
                        request[0].TargetIDs[i] = allTabInfo[i].formtemplatepartid;
                    }
                    var callback = function (rankedTabs, textStatus, jqXHR) {
						try 
						{
                            if (rankedTabs[0][0] && rankedTabs[0][0][0] && !$.isEmptyObject(rankedTabs[0][0][0]) && !rankedTabs[0][0][0].Error) 
							{
                                rankedTabs = rankedTabs[0][0];
								for (i = 0; i < rankedTabs.length && i < Aptify.framework.configuration.relevancy.maxTabShortcuts; i++) 
								{
									... //populate tab shortcuts
					}
 
					Aptify.framework.utility.getMachineLearningRelation(request, callback, false);

This request retrieves a relationship from Persons to Form Template Parts, which represent menu tabs on records. The Person ID is populated with the current user's PersonID, and the Form Template Part IDs are populated with the tabs of the current record. A callback function is constructed that populates the menu bar with shortcuts to tabs in relevant order, and the request and callback are passed into a function that makes the appropriate AJAX call to the Relevancy Controller. Data comes back as an array of arrays, with the first array in each entry of the top-level array representing the data of the response, the second array representing the request as it was parsed, and the third array representing errors if dev mode was enabled and errors were found. More complete documentation on this is provided in the getMachineLearningRelation utility function.

The if block inside the callback is the necessary error checking to ensure the response contains the requested data. rankedTabs[0] indexes into the first (and in this case only) response. rankedTabs[0][0] indexes into the requested data portion of the response (rankedTabs[0][1] would be the echo of the parsed request and rankedTabs[0][2] would be the errors listing if using dev mode). rankedTabs[0][0][0] indexes into the data for the first relation in the response data, which needs to be checked for null, empty object, and for an error response. If the first data entry is valid then all subsequent entries for the request will be valid.

Ideas Going Forward

As of now the Aptify Machine Learning Engine can be used for creating weighted relationships between two entities, which is really powerful, but there is no friendly user interface, and some features that should exist do not. The first thing that needs to be done is adding code support for the labeled relationship metadata, as well as hybrid relationships in which the confidence rating for the label is also stored and can be used for sorting and filtering. The metadata is already built out for this, so I consider it a necessity. Labeled relationships can be represented by binning weights, but native labels are much more readable for human users. Validation scripts also need to be built out for the entities so that invalid metadata cannot be entered.

Those features are necessities for the core engine to be considered fully complete, but in order to make it user-friendly we need to develop a specialized interface for it. I envision this being manifested in a Machine Learning Management Console, with one portion dedicated to editing models, let's call it the Model Designer, and another dedicated to visualizing and manipulating relationship data, the Graph Composition Tool.

The Model Designer would be much like the Process Flow Engine, but with fewer components and a greater focus on displaying information versus just connections. The two components of the Model Designer would be Feature Sets and Models. Feature Set components take in a SQL view (defined in a DBObject as in the metadata) and display a summary of the data they represent. Models component take in one or more Feature Sets and have within them an algorithm and a set of input parameters for that algorithm. A model also defines what type of relationship is produced (directed or undirected, labeled or weighted or both), and displays this something like (Persons)-w->(Companies) to represent a directed, weighted relationship from persons to companies.

Feature Sets make sense to abstract away from models because they may be used by multiple models in the same sequence of computation, and the outputs of models may themselves be used as feature sets for other models. An example of this would be using Persons accessing Companies records to create a relationship from Persons to Companies, and also to make a relationship from Persons to Persons based on how similar their access patterns are. These two relationships can then be used to create a third relationship from Persons to Companies that augments a Person's original weights with the weights of other Persons who have similar access patterns, much like the Facebook friends example except more in line with how the News Feed functions. The first two models take in the same feature set, but this does not itself necessitate separating feature sets from models because one would assume these would appear in separate designer views. It is the third model that necessitates the feature set abstraction in order to get a clear view because it takes in as its feature set the outputs of the first two models, which requires having all of them in the same designer view. This functionality could be implemented without separating the feature sets, but it makes the overall picture much clearer if it is visually obvious that all of the models derive from the same input data.

While the Model Designer eases the process of inputting model metadata and composing models when one has a clear objective in doing so, much of the value in machine learning comes from viewing and understanding the relationships that are produced, finding relationships one was previously unaware of, and creatively combining seemingly-disparate data sets. For this we will need to create a graph viewer application, the Graph Composition Tool.

The basic function of the tool would be to act as a graph viewer. Nodes (origins and targets) would be plotted and the relationships between them would be drawn, including arrows for directed relationships. Weights and labels could be explicitly displayed or encoded in colors and varied line widths. A view of this nature would allow the user to visually interpret what data is connected to other data by these relationships, but it does not tell much above the underlying data and the number of connections may be too dense to interpret, so some filtering will be necessary.

Methods of filtering the graph view include applying a threshold in which only weights above or below the threshold are displayed, only displaying certain labels, only displaying certain records, clustering nodes based on their underlying data, in which similar records are aggregated, and aggregation based on clustering coefficient, which is a graph property that is derived from how strongly connected neighborhoods of nodes are. All of these methods serve to reduce the chaos in the display of the data and help determine what types of data are connected and why.

Once the data for a single model can be viewed and analyzed effectively, there are many interesting compositions that can be produced. For instance, if two models produce relationships between the same nodes, the graph could be redrawn with the same number of nodes but with connections from both models, allowing their results to be visually compared. Similarly, if a model has only one entity in common with the model currently being display (so we have three entities) then the additional nodes and connections for the unshared entity can be added, with connections drawn from the two models unlike entities to their common related entity. These types of compositions can help the user interpret whether models are functioning in a contradictory or redundant manner, as well as to find new relationships that are difficult to derive when viewing and analyzing the data separately. Once a composite relationship is identified it can then be created in a hard model with cached outputs.

I am aware that much of this would be improved with imagery, so I will try to get some made up, but if you have any questions in the meantime feel free to ask!

Notice

We have upgraded our support system to serve you better.
For Support, please go to our Momentive Support Hub located here.

The Machine Learning Handbook

Core Concepts

High-Level Structure

Feature Sets

Input Parameters

Output Graph

Example:

Metadata Structure and Usage

Machine Learning Services

Machine Learning Service Model Types

Machine Learning Entity Relations

Machine Learning Models

Relevancy Contexts

Relevancy Context Categories

Multi-ML Model Caches

Code Structure and Usage

Machine Learning Service Plugins

Relevancy Engine Classes

Ideas Going Forward

Comments

Articles in this section

Notice

We have upgraded our support system to serve you better. For Support, please go to our Momentive Support Hub located here.

Core Concepts

High-Level Structure

Feature Sets

Input Parameters

Output Graph

Example:

Metadata Structure and Usage

Machine Learning Services

Machine Learning Service Model Types

Machine Learning Entity Relations

Machine Learning Models

Relevancy Contexts

Relevancy Context Categories

Multi-ML Model Caches

Code Structure and Usage

Machine Learning Service Plugins

Relevancy Engine Classes

Ideas Going Forward

Articles in this section

We have upgraded our support system to serve you better.
For Support, please go to our Momentive Support Hub located here.