[Skip Global Navigation]

Training

Training Home

SPSS Trainer Tip: Clementine® 10.0

Instructor profile

Jim Mott

Name: Jim Mott

About Jim: Jim has nearly 20 years of experience with SPSS, Inc. From 1984 to 1998, he served as technical writer, technical support specialist, and internal trainer. Jim has been a senior education consultant since 1998. He received a BA from Knox College and an MA and PhD from the University of Illinois at Chicago. In his spare time, Jim enjoys playing classical piano, attending the opera, playing golf, and hiking and camping.

Identifying the most important predictors for a given analysis prior to modeling in Clementine 10.0

When you build predictive models in Clementine, you have to choose from hundreds of potential fields to use as predictors.

While there is no substitute for domain expertise when selecting fields, you can shorten the process considerably with the Feature Selection node. This node allows you to automatically screen out fields based on a variety of criteria, such as having too much missing data or too little variation. You can also rank order the variables based on level of importance. To do so, each predictor is tested against the target variable using the appropriate bivariate analysis to produce a probability value. If both the predictor and the target are continuous, a Pearson correlation is used. If they are both categorical, then the node performs a Chi-square test. The probability values are then turned into an importance measure by subtracting them from 1. An importance measure equal to 1 indicates that the two fields are highly related. 

To use the Feature Selection node:

clip
Figure 1: You’ve selected “Churn” as your target field and chosen 127 input fields.
Click to enlarge.

The Model tab and Options tab can be used to refine the criteria used to screen variables and to define their importance. (In this example, however, we will use the default settings.) Before receiving the model’s output, you’ll need to:

clip
Figure 2: The model selected 63 predictor fields and rejected seven.
Click to enlarge.

At the bottom of the output screen (see Figure 2), you’ll notice that seven of the predictors were rejected because of too much missing data or too little variation. The model selected 63 of the remaining fields as important, considerably reducing your task of field selection.

Now you can add the generated Feature Selection model to the stream and it will filter out the unimportant variables.

We offer SPSS courses at locations around the world.
Find a course in the location nearest to you.