Use the specialized data preparation techniques in SPSS Data Preparation to facilitate data preparation in the analytical process. Like all add-on modules, SPSS Data Preparation easily plugs into SPSS Statistics Base so you can seamlessly work in the SPSS environment.
Data validation has typically been a manual process. You might run a frequency on your data, print the frequencies, circle what needs to be fixed, and check for case IDs. Needless to say, this is time consuming. And since every analyst in your organization could use a slightly different method, maintaining consistency from project to project may be a challenge.
To eliminate manual checks, use the Validate Data procedure. This procedure enables you to apply rules to perform data checks based on each variable’s measure level (whether categorical or continuous). For example, if you’re analyzing survey data that has variables on a five-point Likert scale, use the Validate Data procedure to apply a rule for five-point scales and flag all cases that have values outside of the 1-5 range. You can receive reports of invalid cases as well as summaries of rule violations and the number of cases affected. You can specify validation rules for individual variables (such as range checks) and cross-variable checks (for example, “pregnant males”).
With this knowledge you can determine data validity and remove or correct suspicious cases at your discretion prior to analysis.

Use the Validate Data procedure to define rules and perform data checks.
Click image to enlarge.
Prevent outliers from skewing analyses when you use the Anomaly Detection procedure. This procedure searches for unusual cases based upon deviations from similar cases and gives reasons for such deviations. You can flag outliers by creating a new variable. Once you have identified unusual cases, you can further examine them and determine if they should be included in your analyses.
In order to use algorithms that are designed for nominal attributes (such as Naïve Bayes and logit models), you must bin your scale variables prior to model building. If scale variables aren’t binned, algorithms such as multinomial logistic regression will take an extremely long time to process or they might not converge. This is especially true if you have a large dataset. In addition, the results you receive may be difficult to read or interpret.
Optimal Binning, however, enables you to determine cutpoints to help you reach the best possible outcome for algorithms designed for nominal attributes.
With this procedure, you can select from three types of binning for preprocessing data prior to model building:
Key Products and Services
Resources
Buy
Demos and Downloads