websurvey: feature discovery

[ description - background - transformation - selection - disclaimer ]


description and motivation

How can an agent handle the variables that the environment throws at it? Usually, most agents rely on a priori feature sets that its creator defines. As a result, the agent is not autonomous in the strict sense, and it is limited by its creator's understanding of the environment. Feature discovery aims to address this problem (and subsequent problems).

While compiling this survey, I found many researchers from many disciplines addressing this problem using respective terminology. Furthermore, since this topic is nascent, there is a lack of cohesion in areas besides vocabulary.

For clarity, I will adopt some working definitions in this survey. Feature discovery is the process of transformation and selection within the feature space. Transformation is the process of deriving new features to expand the feature space. Selection is the process of picking out the most relevant subset(s) of features to contract the space. So, in a task like pruning connections in a neural net, feature discovery is just another way of stating feature selection. In a higher level task like a chess game, feature discovery has to take on its full definition.

[ top ]


a little more background

These links should help you get caught up to speed. I, myself, need to get caught up on this stuff.

What is Occam's Razor?
Occam's Razor motivates the question: Do we really need to look at everything?
Support Vector Machines - the book
I noticed SVMs popped up a lot for feature discovery. A little background knowledge here can't hurt. Aside from being tempted to buy the book, this site has good links to references and software you can download.
Kernel Machines
Reading up on kernel machines give more background to your knowledge of SVMs.

[ top ]


on feature transformation

These links are to get your feet wet with the problem of feature transformation. There are few resources that are purely on the topic of transformation because people don't want just a growth of features and no way to reduce them.

Paper: Feature Discovery for Problem Solving Systems
Tom Fawcett's dissertation on this topic. In it, he describes Zenith, a feature discovery system. Being a dissertation, it's a long read.
ClassCK's Feature Selection
An evolutionary programming (EP) approach to feature discovery for the Classifier Construction it (ClassCK). The EP algorithm based on minimizing cost is clearly explained in depth.
Expert-Guided Subgroup Discovery: Methodology and Application
Another complete feature discovery system. This site is nice because the paper is in html form, so you can go to the relevant sections for transformation and selection .
Bibliography of Constructive Induction/Feature Engineering
There's a dearth of good non-paper pages on the topic of transformation. Tom Fawcett has kindly put together an AI bibliography for this topic if you feel inclined to search the literature. Below are some quick links for the citeseer search.
Citeseer search for some papers on the lexical variants of feature transformation.

[ top ]


on feature selection

These links are to get your feet wet with the problem of feature selection. There were more non-paper resources available for this topic. I went ahead and broke them down into subcategories.

workshops
NIPS 2001 Variable and Feature Selection Workshop
The "Problem Description" and "Challenges" section are a good read for background. Isabelle Guyon chaired this workshop.
NIPS 2003 Feature Extraction and Feature Selection Workshop
A later NIPS workshop chaired by Isabelle Guyon. Notice that the workshop is now addressing the issue of transformation by looking at "feature extraction" as well.
publications
Chpt 3 from Feature Selection for Knowlege Discovery and Data Mining
This course site has a chapter entitled "Feature Selection Aspects" that's a good read. Grab the pdf or click through scanned images.
Journal of Machine Learning Research: Special Issue on Variable and Feature Selection
The results of the NIPS 2001 workshop. Some of the [data] links are pretty nifty.
Studies in Text Categorization
Some publications from a research group interested in feature selection in the domain of text categorization.
algorithms/software
Feature Discovery for Sensorimotor Systems
This is a slideshow for a talk given by Daniel Lee, a researcher at Bell Labs. An algorithm is shown in depth by following a handwriting recognition task. There are great diagrams, and the math is straight forward. Face recognition and semantic analysis are also addressed as other applications.
Feature Selection at PARG
PARG's (Pattern Analysis Research Group at Oxford) take on feature selection. They also include the topic of feature transformation because they look at combining features as well. A cell classification task is used to illustrate an algorithm based on maximizing the area of the ROC-convex hull. Good diagrams and straight forward discussion.
Feature Selection in the PyML Library
The PyML (Python Machine Learning) library for the Python programming language gives you some ready-to-use feature selection algorithms.
Subset Selection in Multiple Regression in CoStat
Purported the "world's best" in feature selection, it shows an implementation of feature selection employed by the CoStat statistical software package. What's great is the problems they describe. They give recommendations that are applicable outside of their software package.
miscellaneous
A Class-Specific Features Web Page
The military applications of feature discovery are obvious. Here is the Navy's Undersea Warfare Center Division's take on the feature selection problem. This is a very good resource since it covers a lot of ground and gives clear illustrations of various ideas within feature selection.
Citeseer search for some papers on the lexical variants of feature selection

[ top ]


DISCLAIMER

I do not consider myself an expert or an authority on feature discovery. That said, send me email to correct this page. It can be for simple things like dead links, but I'll be equally appreciative if you correct my opinions on this page. I'm also very receptive to suggestions!

I apologize for not having a "click me" email thing, but I'm tired of spam. You can problably put two and two together to figure out my email address. If not... I guess try harder?