[ description - background - transformation - selection - disclaimer ]
How can an agent handle the variables that the environment throws at it?
Usually, most agents rely on a priori feature sets that its creator defines.
As a result, the agent is not autonomous in the strict sense, and it is limited
by its creator's understanding of the environment. Feature discovery aims to
address this problem (and subsequent problems).
While compiling this survey, I found many researchers from many disciplines
addressing this problem using respective terminology. Furthermore, since this
topic is nascent, there is a lack of cohesion in areas besides vocabulary.
For clarity, I will adopt
some working definitions in this survey. Feature discovery is the process of
transformation and selection within the feature space. Transformation is the
process of deriving new features to expand the feature space. Selection is the
process of picking out the most relevant subset(s) of features to contract the
space. So, in a task like pruning connections in a neural net, feature
discovery is just another way of stating feature selection. In a higher level
task like a chess game, feature discovery has to take on its full
definition.
[ top ]
These links should help you get caught up to speed. I, myself, need to get
caught up on this stuff.
- What is Occam's Razor?
- Occam's Razor motivates the question: Do we really need to look at
everything?
- Support Vector Machines - the book
- I noticed SVMs popped up a lot for feature discovery. A little
background knowledge here can't hurt. Aside from being tempted to buy the
book, this site has good links to references and software you can
download.
- Kernel Machines
- Reading up on kernel machines give more background to your knowledge of
SVMs.
[ top ]
These links are to get your feet wet with the problem of feature
transformation. There are few resources that are purely on the topic of
transformation because people don't want just a growth of features and no way
to reduce them.
- Paper: Feature Discovery for Problem Solving Systems
- Tom Fawcett's dissertation on this topic. In it, he describes Zenith,
a feature discovery system. Being a dissertation, it's a long read.
- ClassCK's Feature Selection
- An evolutionary programming (EP) approach to feature discovery for the
Classifier Construction it (ClassCK). The EP algorithm based on minimizing
cost is clearly explained in depth.
- Expert-Guided Subgroup Discovery: Methodology and Application
- Another complete feature discovery system. This site is nice because
the paper is in html form, so you can go to the relevant sections for transformation
and selection
.
- Bibliography of Constructive Induction/Feature Engineering
- There's a dearth of good non-paper pages on the topic of
transformation. Tom Fawcett has kindly put together an AI bibliography for this
topic if you feel inclined to search the literature. Below are some quick
links for the citeseer search.
- Citeseer search for some papers on the lexical variants of feature
transformation.
[ top ]
These links are to get your feet wet with the problem of feature
selection. There were more non-paper resources available for this topic.
I went ahead and broke them down into subcategories.
workshops
- NIPS 2001 Variable and Feature Selection Workshop
- The "Problem Description" and "Challenges" section are a good read for
background. Isabelle Guyon chaired this workshop.
- NIPS 2003 Feature Extraction and Feature Selection Workshop
- A later NIPS workshop chaired by Isabelle Guyon. Notice that the
workshop is now addressing the issue of transformation by looking at "feature
extraction" as well.
publications
- Chpt 3 from Feature Selection for Knowlege Discovery and Data Mining
- This course site has a chapter entitled "Feature Selection Aspects"
that's a good read. Grab the pdf
or click through scanned
images.
- Journal of Machine Learning Research: Special Issue on Variable and Feature Selection
- The results of the NIPS 2001 workshop. Some of the [data] links are
pretty nifty.
- Studies in Text Categorization
- Some publications from a research group interested in feature selection
in the domain of text categorization.
algorithms/software
- Feature Discovery for Sensorimotor Systems
- This is a slideshow for a talk given by Daniel Lee, a researcher at
Bell Labs. An algorithm is shown in depth by following a handwriting
recognition task. There are great diagrams, and the math is straight forward.
Face recognition and semantic analysis are also addressed as other
applications.
- Feature Selection at PARG
- PARG's (Pattern Analysis Research Group at Oxford) take on feature
selection. They also include the topic of feature transformation because they
look at combining features as well. A cell classification task is used to
illustrate an algorithm based on maximizing the area of the ROC-convex hull.
Good diagrams and straight forward discussion.
- Feature Selection in the PyML Library
- The PyML (Python Machine Learning) library for the Python programming
language gives you some ready-to-use feature selection algorithms.
- Subset Selection in Multiple Regression in CoStat
- Purported the "world's best" in feature selection, it shows an
implementation of feature selection employed by the CoStat statistical software
package. What's great is the problems they describe. They give
recommendations that are applicable outside of their software package.
miscellaneous
- A Class-Specific Features Web Page
- The military applications of feature discovery are obvious. Here is
the Navy's Undersea Warfare Center Division's take on the feature selection
problem. This is a very good resource since it covers a lot of ground and
gives clear illustrations of various ideas within feature selection.
- Citeseer search for some papers on the lexical variants of feature selection
[ top ]
I do not consider myself an expert or an authority on feature discovery.
That said, send me email to correct this page. It can be for simple things
like dead links, but I'll be equally appreciative if you correct my opinions on
this page. I'm also very receptive to suggestions!
I apologize for not having a "click me" email thing, but I'm tired of spam.
You can problably put two and two together to figure out my email address. If
not... I guess try harder?