This graphical display contains two different functions. It is an updating scatterplot as well as a scatterplot snake. The updating scatterplot was developed here in Data Desk (a paper is currently being written). Plot y vs. x. Another “ordering variable” determines which data is displayed at any one time. This ordering variable is converted to be from [0, 1]. Control the data that is displayed using a ‘location’ and ‘bandwidth’ parameter. For example, if you have 100,000 observations, plotting them all at once is just a mess. You can use the density plot discussed above and use this plot to get an understanding of the large dataset.
For example, you might set the location slider to 0 and set the bandwidth slider to 0.05. Then, slide the location slider from 0 up to 1. As you move the slider, the plot continually updates and only displays the points between location +/- bandwidth as determined by the ordering variable. Initially, the data between 0 and 0.05 are displayed. Once you get to, say, location = 0.50, then data is displayed that lies between 0.45 and 0.55 of the ordering variable. In other words, the middle 10% (55-45) of the data is displayed on the plot.
It is useful to use random numbers as the ordering variable. Then, as you move the location and bandwidth parameters, you get a basic unstructured view of the data. Then, replace the random variable with a “real” ordering variable (say income) and update through the data.
A scatterplot snake is also programmed into this plot. Displaying lines dynamically as you move through the ordering variable. This implementation is much more powerful that other programs because you have control of both the location and the bandwidth instead of the starting the snake and letting it go to the end. If you want to do that, just set the location = 0, then increase the bandwidth from 0 to 1. I like to set the bandwidth to an amount that doesn’t put so many lines on the plot that it is distracting, then use the location parameter to move through the data.