LPD: Learning Prototypes and Distances

LPD is an algorithm that simultaneously trains both a reduced set of prototypes and a suitable local metric for these prototypes. Starting with an initial selection of a small number of prototypes, it iteratively adjusts both the position (features) of these prototypes and the corresponding local-metric weights. The resulting prototypes/metric combination minimizes a suitable estimation of the classification error probability.

Reference

For all the details please read the draft copy of the paper LPD_Paredes_Vidal-PR2006.pdf

Source code

Changelog

Version 0.2

Version 0.1

Corpus for testing

The files have as many rows as protoypes. Each row is a d-dimensional vector (prototype) and the last column is the class label (it can be a string) so each row has d+1 columns.

Some Results

Task NN-L2 LPD(5%) LPD(2%)
Balance 40.0% 29.1% 29.4%
Diabetes 32.0% 27.6% 27.0%
DNA 23.4% 6.9% 8.4%

Help

1. Compile (Linux):

gcc -o lpd lpd.c -O6 -lm
gcc -o nnw nnw.c -O6 -lm

2. Learning:

lpd trainingfile reduction [-sigm slope] [-mu mu] [-pmu pmu] [-i it] [-thr thr] [-V] [-seed s] [-fastloo step] [-IM] [-L2] [-ICM] [-fix] [-cmeans]

lpd requieres at least two parameters:

  1. training file
  2. size of the reduced set (% of the original training file)

3. Optional Parameters:

-sigm Sigmoid slope (10)
-mu Weight Learning Step (0.001)
-pmu Prototype Learning Step (0.01)
-thr Threshold to stop iterations (0.0000001)
-i Max Iterations (10000)
-s Random seed (1234567)
-V Verbose
-IM Distance initialized to Diagonal Mahalanobis
-L2 Distance initialized to Euclidean
-ICM Distance initialized to Class-Dependent Diagonal Mahalanobis
-fastloo Subsampling for the LOO estimation (1, no subsampling)
-fix To fix the same number of prototypes per class using the reduction argument
-cmeans Use cmeans prototypes initialization instead of random

4. Some examples:

5. Program result, two files:

6. Test the result:

nnw reduced.lpd XXXX_test weights.lpd

Research