Data transformation attempts to convert abnormally distributed data so that it becomes normally distributed. CM4D will attempt to transform abnormal data using a variety of transform methods (listed below), and will automatically choose the best transform method. Generally, if transformation fails to improve the data, no transform will be applied. However, refer to the process transforms section below for an exception to this rule.
Data transformation can be applied to a DataSet by selecting the DataSet transform option. A DataSet transformation is confined to the document containing the used DataSet. A DataSet transformation is applied only to the abnormal data within the used DataSet and will affect only those annotation that use the DataSet as a DataSource.
Process transforms are created when creating control limits or by selecting the DataSet option to create process transforms. Process transforms will be created only for data that is abnormal at the time the transform is being attempted.
Process transforms are saved in the CM4D database and will be applied to all DataSets in all documents that use data having process transforms. Process transforms are associated to a specific process baseline and are associated and applied at the process feature characteristic level.
Process transforms are in effect until they are deleted either manually or when updating control limits. Process transforms do not get carried forward to other process baselines. Process transforms will override DataSet transforms where appropriate. Once a process transform is created and saved, it is applied to all appropriate data regardless of whether the data is normal or abnormal.
CM4D will attempt to transform data using the following transformation methods where “(X)” represents a value being transformed:
“-1 / (X)”, “Square-of (X)”, “Square Root of (X)”, “-1 / Square Root of (X)”, “Cube of (X)”, “-1 (X)”, “Log of (X)”, “Box-Cox transform of (X)”, “Power transform of (X)” and “Yeo-Johnson transform of (X)”
The Box-Cox, Power, and Yeo-Johnson transforms are attempted with a range of parameters and if used the selected parameter is saved with the transform type to the database.
Information on the Box-Cox and Yeo-Johnson transforms can be found at: http://www.stat.umn.edu/arc/yjpower.pdf
Information on the Power transform can be found at: http://www.itl.nist.gov/div898/handbook/ppc/section4/ppc47.htm
All transforms are applied when a DataSet is resolved and no transformed data is ever saved to the CM4D database. Control limits and process control statistics calculated from transformed data are saved to the database. Specification Limits, reasonable limits, and nominal values are transformed along with individual values. Control limits are never transformed, but they may be calculated from transformed data if transformation is selected during the creation of control limits.
Prior to transforming any data, CM4D will offset the data by the minimum amount needed to remove any negative values. This is required as some transform methods do not operate properly with negative values. After transforming the data with any method, CM4D will re-scale all transformed data to retain the original nominal value and one of the original specification limits.
The variable ~transformtype~ will return the transformation type used for a set of data. It may also be viewed in the Feature Editor.