LPD: Learning Prototypes and Distances
LPD is an algorithm that simultaneously trains both a reduced set of prototypes and a suitable local metric for these prototypes. Starting with an initial selection of a small number of prototypes, it iteratively adjusts both the position (features) of these prototypes and the corresponding local-metric weights. The resulting prototypes/metric combination minimizes a suitable estimation of the classification error probability.
Reference
For all the details please read the draft copy of the paper LPD_Paredes_Vidal-PR2006.pdf
Source code
- LPD C - implementation (and Nearest Neighbor with Weighted Distance classifier for testing the results)
Changelog
Version 0.2
- Avoid computing LOO estimations when some distance initialization is selected
- New argument: [-fix] to fix the same number of prototypes per class: lpd training 3 -fix, Fix the number of prototypes per class to 3 instead of 3%
- New argument: [-cmeans] C-means initialization added.
Version 0.1
- Initial version
Corpus for testing
The files have as many rows as protoypes. Each row is a d-dimensional vector (prototype) and the last column is the class label (it can be a string) so each row has d+1 columns.
Some Results
- NN: Nearest Neighbor classification rule
- NN-L2: NN using the original training set and Euclidean distance
- LPD(5%): NN using the reduced set (5% of original) and weighted distance, both learned by LPD
- LPD(2%): NN using the reduced set (2% of original) and weighted distance, both learned by LPD
| Task | NN-L2 | LPD(5%) | LPD(2%) |
|---|---|---|---|
| Balance | 40.0% | 29.1% | 29.4% |
| Diabetes | 32.0% | 27.6% | 27.0% |
| DNA | 23.4% | 6.9% | 8.4% |
Help
1. Compile (Linux):
gcc -o lpd lpd.c -O6 -lm
gcc -o nnw nnw.c -O6 -lm
2. Learning:
lpd trainingfile reduction [-sigm slope] [-mu mu] [-pmu pmu] [-i it] [-thr thr] [-V] [-seed s] [-fastloo step] [-IM] [-L2] [-ICM] [-fix] [-cmeans]lpd requieres at least two parameters:
- training file
- size of the reduced set (% of the original training file)
3. Optional Parameters:
| -sigm | Sigmoid slope (10) |
| -mu | Weight Learning Step (0.001) |
| -pmu | Prototype Learning Step (0.01) |
| -thr | Threshold to stop iterations (0.0000001) |
| -i | Max Iterations (10000) |
| -s | Random seed (1234567) |
| -V | Verbose |
| -IM | Distance initialized to Diagonal Mahalanobis |
| -L2 | Distance initialized to Euclidean |
| -ICM | Distance initialized to Class-Dependent Diagonal Mahalanobis |
| -fastloo | Subsampling for the LOO estimation (1, no subsampling) |
| -fix | To fix the same number of prototypes per class using the reduction argument |
| -cmeans | Use cmeans prototypes initialization instead of random |
4. Some examples:
- Basic use:
lpd balance_training 5
- Faster leaving one out estimation:
lpd dna_training 5 -fastloo 3
- To force some distance weights initialization:
lpd dna_training 5 -fastloo 3 -L2
- To change the step factors:
lpd diabetes_training 5 -mu 0.0001 -pmu 0.001
5. Program result, two files:
- reduced.lpd: file with the reduced set of prototypes, same format that training file
- weights.lpd: file with the weights associated to each prototypes of the reduced set
6. Test the result:
nnw reduced.lpd XXXX_test weights.lpd