Missing Values

Sometimes we cannot obtain a data value for each case. Values are lost, experiments are ruined, respondents don’t respond, observations are not observed, records are not recorded, and so on. You can indicate a missing data value in a quantitative variable with a blank or with any non-numeric characters. Any data value that isn’t a valid number is treated as a missing value by any operation that requires numbers. Thus you can label missing values with information about the cause of the omission (e.g. “bad point” or “Holiday”)

Monetary unit symbols ($, £, ¥, €) and commas separating thousands digits are not considered non-numeric and are read correctly. Percent (%) and cents (¢) symbols following a number indicate that it should be divided by 100 before being used in calculations. Data Desk also correctly interprets numbers when your computer is set to the European standard of separating thousands with a “.” and decimals with “,” .

Sometimes it is useful to mark a case as missing temporarily to remove it from an analysis. One easy way to do so is to place a non=numeric character in front of the case value. A

“*” makes a good marker that is easy to find and remove when the numeric value is needed again. The data value “*3.4” is treated as missing, but preserves its original numeric value for later reference. To mark a case without making it missing, use a money symbol.

When Data Desk opens a variable that resulted from an internal computation, it displays missing values with the “•” symbol. You can type this symbol as Alt-8 on Windows, or Option-8 on Mac. Data consisting of category or group names is considered missing only if the case is empty or consists of a •.  Otherwise, the text will be considered a category name.

Data Desk represents missing values internally with a construct called a NaN, or “Not a Number”. NaNs also result from calculations that involve missing values, or from calculations that yield a non-numeric result (such as the square root of –1 or the log of 0).