Templates
Below is a collection of user submitted templates free to download. You can also create and submit your own templates using the following form. In the form, please provide your name and a description of what your template does so that we may credit you and provide helpful information for users who will use your template.
Templates
2x2 Chi-Square
This is a template for doing a 2×2 contingency table just by entering in the counts into the cells n11, n12, n21, and n22. No need to enter 1’s and 0’s for everything or use the Replicate command.
Author: Matthew C. Hutcheson
Download: 2x2-Chi-Square-Template.dsk
Confidence Intervals
This template demonstrates the random nature of confidence intervals. It generates random samples from a variable and plots confidence intervals for the variable mean generated by those samples.
Author: Chris Noble
Download: Confidence-Intervals.dsk
Cronbach's Alpha
Cronbach’s Alpha is a statistic that measures the reliability of tests, observations, experiments or measurements by estimating the extent to which the tests, observations, experiments or measurements provide the same results on repeated trials. Cronbach’s Alpha is a value between 0 and 1. Values near 0 indicate low reliabilty. Values near 1 indicate high reliability. (See Cronbach, 1951 and Carmines and Zeller, 1979).
Author: Data Description, Inc.
Download: Cronbachs-Alpha.dsk
Density Plot
The density plot assigns a grid to the scatterplot and counts the number of points in each cell of that grid, then displays them three-dimensionally with the count as the third dimension. Adding lines to this three-dimensional plot creates a fabric-like mesh across the plot revealing the density of “overstrikes” (points piled on top of one another) and clusters in the data.
The user has control over the underlying scatterplot grid. Use finer grids on larger datasets to separate out the smallest clusters within the data. The plot is also colored so peaks turn red while valleys are green or blue. Combined with the ability to rotate, this graphical display is wonderful for presentations and revealing patterns in the data.
Author: Matthew C. Hutcheson
Download: Density-Plot.dsk
Durbin-Watson
Computes the Durbin-Watson statistic for serial correlation.
Author: Matthew C. Hutcheson
Download: Durbin-Watson-Template.dsk
Ewma By Append
Exponentially-weighted moving average (Ewma) solved using iteration. Given scalar ‘alpha’ (between 0 and 1) and vector ‘X’, this template solves for L, such that:
L(t) = (alpha*X) + (1-alpha)*L(t-1) L(1) = X(1)
It solves this by looping through all the cases and appending the results.
Author: Paul Pratt
Download: Ewma-By-Append.dsk
Ewma By Iteration
Exponentially-weighted moving average (Ewma) solved using iteration. Given scalar ‘alpha’ (between 0 and 1) and vector ‘X’, this template solves for L, such that:
L(t) = (alpha*X) + (1-alpha)*L(t-1) L(1) = X(1)
It uses iteration if alpha is closer to 1.
Author: Paul Pratt
Download: Ewma-By-Iteration.dsk
F Test for Variances
This template performs the standard F-test for equality of population variances. Given a sample s1 of size n from population p1 and another sample s2 of size m from population p2, the ratio (svar1/svar2)/(popvar1/popvar2) has an F distribution.
This template computes the sample standard deviations and then compares their ratio to an appropriate F distribution (specifically one with n-1 and m-1 degrees of freedom, respectively). It then reports the p-value of obtaining a result as extreme or more extreme than the result obtained given that the chosen null hypothesis is true.
Author: Data Description, Inc.
Download: F-Test-for-Variances.dsk
Fourier Transform
Use this template to compute a Fourier Transform of any size real data sequence.
Author: Matthew C. Hutcheson & Paul Pratt
Download: Fourier-Transform.dsk
Globe 3D
The Globe 3D template takes latitude and longitude (in degrees) as input and creates a three-dimensional rotating globe. This plot provides an exciting look at geographic data. Furthermore, you can add colors by variables of interest to reveal interesting patterns in the globe.
Author: Unknown
Modified by: Matthew C. Hutcheson
Download: Globe-3D.dsk
Index Plots
The index plot is often used when plotting distance measures such as Mahalanobi’s distance or Hadi’s distance. Each observation has a line drawn from it’s value to zero. This creates a series of vertical lines. Using time series as the x-axis, then dips or gaps or peaks in the sequence of verticle lines reveals places where something interesting may be happening in that time series.
Experiment with this plot in which the y and x are the residuals and the predicted values from a regression or linear model. You may be surprised with the patterns revealed.
Author: Matthew C. Hutcheson
Download: Index-Plots.dsk
Kolmogorov Test for Normality
Use this template to perform the popular Kolmogorov Test for normality on a sample.
Author: Data Description, Inc.
Download: KolmogorovTestForNormality.dsk
Kruskal Wallis Test
Use this template to perform the nonparametric Kruskal Wallis test.
Author: Data Description, Inc.
Download: Kruskal-Wallis.dsk
Levene Test
The Levene Test tests for nonconstant variance across groups.This template computes, for each measured value, the absolute value of the difference between the measured value and the group median.
Author: Data Description, Inc.
Download: Levene_Test.dsk
Logistic Regression - Grouped
This template provides a complete generalized linear model for binary dependent variables. It expects grouped data, meaning one case for each covariate pattern. This template uses the Iterated Reweighted Least Squares method.
Author: Walter Linde-Zwirble
Download: Logistic-Regression-grouped.dsk
Logistic Regression - Ungrouped
The ‘logistic regression – ungrouped’ template is a re-tooling of the ‘logistic regression – grouped’ template. In this template, the data do not have to be grouped. It requires a binary dependent variable and can analyze any number of factors. This template offers a selection of link functions and computes a wide range of statistics including model likelihood, deviance, dispersion, Hosmer-Lemeshow, and ROC.
Author: Data Description, Inc.
Download: Logistic-Regression-ungroup.dsk
Mandlebrot
A fun and interesting build of the famous Mandelbrot Set fractal.
Author: Walter Linde-Zwirble
Download: Mandelbrot.dsk
Mantel-Haenszel
The Mantel-Haenszel template computes the Mantel-Haenszel statistic and the corresponding Chi-Square based p-value with one degree of freedom. The MH test is used for combining several 2×2 contingency tables to obtain one test statistic and one p-value. For example, the MH test is often used to combine several clinical trial results together into one larger test.
For the technical folks, the user has control over the correction factor and the variance component and thus can obtain results based on Mantel and Haenszel (1959), Cochran (1954) or Grizzle (1967).
Author: Matthew C. Hutcheson
Download: Mantel-Haenszel.dsk
Multiple Time Series Plot
This template plots up to six dependent variables as functions of a single independent variable in one graph, and allows for easy rescaling of the vertical axis for each variable. This extends the multiple line plots in several ways. Relationships between variables measured in entirely different units or scales can be examined without creating derived variables, and the independent variable can have uneven intervals (it is not just the case number), and, in fact, need not even be sorted. It is meant to be useful in biological time-series data which often have measurements unevenly spaced in time.
Author: Chris Noble
Download: Multiple-Time-Series.dsk
Parallel Coordinate Plot
The parallel coordinate plot is designed to view multidimensional data. Each variable (up to 8 here) is standardized to [0,1] and then plotted as several side-by-side dotplots. Lines connect each case in one variable (or dotplot) to it’s corresponding case in every other variable.
In the past few years, this plot has become quite popular. A fascinating spirograph-like patterns appear if you plot two sorted random normals (one sorted ascending and one descending) or Cauchy. Note: You can generate Cauchy by taking a Normal(0, 1) divided by another Normal(0, 1).
Author: Matthew C. Hutcheson
Download: Parallel-Coordinate-Plot.dsk
Quadwise Plot
The Quadwise plot is designed (in an attempt) to view four dimensional data. This template plots two y-variables and two x-variables. y1 vs. x1 is plotted on the left-hand side of the “quadwise” scatterplot, and y2 vs. x2 is plotted on the right-hand side. Lines connect each case in the left hand scatterplot with its corresponding case in the right hand scatterplot.
Author: Matthew C. Hutcheson
Download: Quadwise-Plot.dsk
Regression CI Visualization
This template demonstrates regression confidence intervals. One somewhat difficult topic in teaching regression is explaining to students why confidence intervals for regression lines are hyperbolic in shape. This template allows one to visualize the process.
The user has control over the sample size, the number of samples, the amount of error variance and heteroskedasticity (non-constant variance). Adjusting the error heteroskedasticity reveals hyperbolic shapes with narrow cones on one end and fat cones on the other end. Increase the sample size to get tighter intervals.
Author: Matthew C. Hutcheson
Download: Regression-CI-Visualization.dsk
Sampling Distribution
This template demonstrates the empirical sampling distribution of a sample mean. This can be used to demonstrate the Central Limit Theorem without the restriction of sampling from a uniform population. This template also demonstrates the empirical sampling distribution of the difference between means. It can be used to test the hypothesis of equality of means through resampling rather than parametric methods.
Author: Chris Noble
Download: Sampling-Distribution.dsk
Simple Regression Intervals
This template draws prediction and confidence interval bands for a simple regression of Y vs. X on a scatterplot of the data. It also calculates the exact endpoints of these intervals for a user-defined X-value. The user has control over the confidence level, the X-value for the calculated interval, and the color of the lines on the interval plot.
Author: John H. Walker
Download: simple_regr_interval.dsk
Simulation Sample
This program illustrates the Central Limit Theorem. A random uniform variable with a given number of cases is generated. Its mean is computed and appended to the end of a variable which is plotted in a histogram and probability plot.
Author: Paul Pratt
Download: Simulation_Sample.dsk
Tweak B0 and B1
This template draws a scatterplot and the user has control over the best-fitting line. If the user changes the intercept and slope, then the line automatically moves and summary statistics are automatically computed and updated. This gives insight into minimizing SSE and R2. The user also controls the error structure to see its effect on the regression and the scatterplot.
Finally, the user can hit a button to automatically minimize the best-fitting line using least squares (sum of squares) or sum of absolute error.
Author: Matthew C. Hutcheson
Download: Tweak-B0-and-B1.dsk
Update Scatterplot/Snake
This graphical display contains two different functions. It is an updating scatterplot as well as a scatterplot snake. The updating scatterplot was developed here in Data Desk (a paper is currently being written). Plot y vs. x. Another “ordering variable” determines which data is displayed at any one time. This ordering variable is converted to be from [0, 1]. Control the data that is displayed using a ‘location’ and ‘bandwidth’ parameter. For example, if you have 100,000 observations, plotting them all at once is just a mess. You can use the density plot discussed above and use this plot to get an understanding of the large dataset.
For example, you might set the location slider to 0 and set the bandwidth slider to 0.05. Then, slide the location slider from 0 up to 1. As you move the slider, the plot continually updates and only displays the points between location +/- bandwidth as determined by the ordering variable. Initially, the data between 0 and 0.05 are displayed. Once you get to, say, location = 0.50, then data is displayed that lies between 0.45 and 0.55 of the ordering variable. In other words, the middle 10% (55-45) of the data is displayed on the plot.
It is useful to use random numbers as the ordering variable. Then, as you move the location and bandwidth parameters, you get a basic unstructured view of the data. Then, replace the random variable with a “real” ordering variable (say income) and update through the data.
A scatterplot snake is also programmed into this plot. Displaying lines dynamically as you move through the ordering variable. This implementation is much more powerful that other programs because you have control of both the location and the bandwidth instead of the starting the snake and letting it go to the end. If you want to do that, just set the location = 0, then increase the bandwidth from 0 to 1. I like to set the bandwidth to an amount that doesn’t put so many lines on the plot that it is distracting, then use the location parameter to move through the data.
Author: Matthew C. Hutcheson
Download: Update-ScatterplotSnake.dsk
Vertical Calculation
Use this template to perform Vertical Calculations for datasets of any size.
Author: Matthew C. Hutcheson
Download: Vertical_Calculation.dsk
Vertical Calculation 2
This is another version of the previous template dealing with Vertical Calculations.
Author: Paul Pratt
Download: Vertical_Calculation_2.dsk
ZipMap
This template is useful for displaying data geographically. It contains a database of latitude and longitude associated with five digit zip codes for the continental US. If you have a variable that contains 5-digit zip codes, drop it into the “socket” and click on the button named Display Map.
Author: Matthew C. Hutcheson
Download: ZipMap.dsk