Frog Tutorial

The Big Picture

Important Classes

Simple Data Structures: several basic data structures have been implemented and used throughout the system, these include:

	Vector: usually contains an ordered set of elements, the size of the vector increases automatically;
	Container: this class is used to store an unordered set of elements, these elements can only be accessed through an iterator;
	Graph: this contains both directed and undirected graphs, several standard graph manipulation algorithms have been implemented in this class;
	Random number generator: this can sample integers, reals and Gaussians. Creating several instances allows for independent generators to be run concurrently;
	Matrix: this class represents matrices and has some standard algorithms, such as matrix inversion;
	Hashtable;
	Heap.

RVs (Random Variables): there is a large hierarchy for random variables, this will be described later. The main types are: RVs, that contain a name, unique id and type; Instantiations, that also contain values, and; Sample, which in addition contains a weight and a probability.

Measures: these are unnormalized densities or probability functions, for example: density trees, Gaussians and CPTs.

BNs (Bayes Nets): this is a special type of measure, inheriting also from directed graph with nodes being random variables. This class constains methods such as: add edge and sample.

Inference: this is performed using a join tree, which is an undirected graph, this is also a measure. The most vanilla type of inference implemented is the exact with discrete variables. Other types of inference implemented are: condition Gaussians exact inference and approximate inference as described in Uri & Daphne's paper.

DBNs: this is currently a subclass of BNs, but in future they will both inherit from a common parent. A DBN doesn't contain CPTs for every node, since nodes at time t have no CPTs. For performing inference as described in Xavier & Daphne's paper, the distribution from time t+1 can be translated to time t.

Parser: the parser reads a file and calls the BN API to create a network. There is documentation available about adding new parameters to the parser, thus, allowing the addition of new parameters to the BN file.

Learn: a general greedy search algorithm has been implemented to learn BN structure. Learning has been implemented for comple data and table CPTs. In future, this will be improved, allowing for incomplete data and implementing SEM.

PRMs (Probabilistic Relational Models): this extension of Bayes Nets permits the creation of a more structured model, but it is still flat (as opposed to Object Oriented Bayesian Networks). This class implements learning with complete data and is able to interface with an existing database (PGSQL).

SPOOK: this system for implementing Object Oriented Bayes Nets performs inference by transforming them into a Bayes Net and then uses Frog to perform the inference step.

Design Decisions

Smart Pointers

Smart pointers are a method for implementing garbage collection in C++. They encapsulate objects and contain a reference counter. This counter records how many pointers refer to that object. If the counter is zero, then the memory for that object is deallocated. This allows for simpler coding and less worries about memory leaks.

For every class that can be smart pointed, there is a class file (suffix "_cl"), which contains the code for the class itself, and a smart pointer class (suffix "_sptr"). The smart pointer class contains a pointer to the class and an integer to maintain the reference count. All the standard operators are overloaded in the smart pointer class, including: =, -> and new.

For example, consider the class Name_cl, which contains a string, and the class Name_sptr which is a smart pointer to Name_cl. The following code is examplifies typical operations using smart pointers:

Name_sptr a,b;
b = new Name_cl("aaa");
a = b;
a->Print();

NOTE: a is a smart pointer object, it should not be deleted!

Now consider a subclass of Name_cl called Surname_cl, the following code is valid:
Surname_sptr c;
...
a = c;
a->Print();

In this case, "a" is a superclass of "c", thus there are no problems in the assignment. On the other hand, if we were trying to assign "a" to "c", we would normaly have to typecast "a" when doing the assignment. With smart pointers, we can't use a typecasting, because we don't have a reference to the object. Thus, we must apply a conversion method:
c = Conv2Surname(a);

In general, there exists a method Conv2Type(...), for every Type of smart pointer.

Observations:
There are situations when smart pointers are unnecessary, for example, nodes in a graph need not be smart pointed, as they will be removed when the graph is deallocated.
When a method returns an object and you don't know if it is safe to change this object, one way to check is to check there reference counter on the smart pointer, if the value is one, then that is the only reference to the object, so it must be OK to change it.

Some naming conventions:

_cl - classes

_sptr - smart pointers

_ptr - standard pointers

Random Variables

The figure below illustrates the hierarchy in different classes for random variables.

A Random Variable, Rv, includes:

id: this is an unique identifier for this variable;

name;

domain: this is used to differentiate between discrete and continuous variables.

There are two important distinctions between classes in this hierarchy:

Single versus Vector: Single describes a class that stores only one variable. Vector, on the other hand, uses the Vector class from simple data structures to store a collection of variables.

Rv versus Instance: an Rv describes a random variable, while an instance is an Rv with an associated value. A Sample also includes a weight and a probability for that sample.

Some important methods in VarsSet_cl are:

GetNumberOfVars(void);

GetJointDomainSize(void), this assumes that the variable is discrete;

IsMember(SingleVar_cl var), the equality check is different for each type of variable. For Rvs, it checks if they are the same Rv. For Instances, it checks that the Rv and the Value are the same;

IsPureDiscrete(void).

In InstSet_cl, there are some additional important methods:

Methods for iterating through the set (NextInst(void), PrevInst(void), IsLegalValue(void), SetFirstValue(void), SetLastValue(void) and SetValue(int index1)). In the example below, the variables in the set are printed:

InstSet_sptr inst = .....
for(inst->SetFirstValue(); inst->IsLegalValue(); inst->NextValue())
cout << inst;
RecalculateIndex(void), this method must be called when there is a chance that a different object altered the current index of a variable in this instantiation.

Some additional methods in SingleInst_cl are:

methods to set the value of this instantiation (SetValue(...)).

Comments, questions and suggestions, please contact guestrin@cs.stanford.edu.

	id: this is an unique identifier for this variable;
	name;
	domain: this is used to differentiate between discrete and continuous variables.

	Single versus Vector: Single describes a class that stores only one variable. Vector, on the other hand, uses the Vector class from simple data structures to store a collection of variables.
	Rv versus Instance: an Rv describes a random variable, while an instance is an Rv with an associated value. A Sample also includes a weight and a probability for that sample.

	GetNumberOfVars(void);
	GetJointDomainSize(void), this assumes that the variable is discrete;
	IsMember(SingleVar_cl var), the equality check is different for each type of variable. For Rvs, it checks if they are the same Rv. For Instances, it checks that the Rv and the Value are the same;
	IsPureDiscrete(void).