Overview of Importing Data

The first step toward visualizing and analyzing your own data is preparing your data so it can be imported into SpaceStat. Since SpaceStat is built upon our unique space-time platform, all of your data must be associated with a spatial location and a time. Currently the software supports analyses based on two representations of space: point and polygon geographies. You may also import line geographies for visualization purposes, but they cannot currently used with the statistical methods.

In many cases, your first step in creating a SpaceStat project will be to import a shapefile that represents the spatial support for your data or to import a feature class from a gdb file. This is often an areal (polygon) representation of counties, states, zip codes, or some similar political boundary, but can be any collection of polygons or points. Within shapefiles or gdb files, each spatial unit has a unique identifier (ID), which allows information or attributes of that spatial unit to be linked and tracked over time. If the shapefile already contains attribute information (datasets) as part of the associated DBF, you can begin to visualize and analyze your data right away. If not, you can import datasets as Excel, text, or DBF files, and link them to the geography with a common ID column. Point geographies can be imported as shapefiles or gdb files, or as Excel, text, or DBF files of coordinates and IDs.

Importing shapefiles, gdb files or point data coverages is a routine task when working with GIS software, and is likely to be familiar to many SpaceStat users. However, in SpaceStat you also need to be able assign each attribute to a time, because SpaceStat allows you to work with data for which the attributes, and even the shape of a given polygon can change over time. Similarly, the location and attributes of objects represented in point geographies can change over time, but the ID of each point remains constant. The dynamic nature of how data are represented in SpaceStat makes it possible to visualize and analyze data in exciting new ways, but does mean that the process of getting data into the program involves a few additional steps to ensure that objects and their attributes are represented correctly. Here we provide a brief overview of what is required to bring data in to SpaceStat, and how data can be modified once it has been successfully imported.

Handling time - time series or time slice?

Once you have chosen a polygon or point file to bring into SpaceStat, the first question you will need to address is "What is the most appropriate way to assign time stamps to the data?" In SpaceStat you have two options - to treat data as a time series, or as time slices. We provide this choice because sometimes you will be working with data that is collected in specific time intervals (time slices), while in other cases, values can change for different objects at different times (best represented as a time series). Time slices are cases where data values or locations change synchronously across objects in the geography (or at least are all recorded at the same time) so time stamps can be applied to all objects at once, and data for different time intervals within the total interval can be grouped into one dataset during the import process (see the Illinois tutorial for an example). However, if attributes or locations of objects change at different times (asynchronously), the start and end time for each combination of attribute and location needs to be recorded as an individual line within a column of data. As a result, each object ID is likely to be repeated in a file to be imported; we call these temporal data of this type "time series". SpaceStat treats the first date or time in a series as an inclusive start value, and treats the end value as exclusive for both temporal representations. There is an "advanced" option on the import dialog box to change this default if desired.

In the end, a time series data file requires more "up-front" work to get it ready for import (i.e., each line must have a start and end date or time), but there are fewer steps in the import process. Many sources of publicly available space-time data (e.g., cancer rate data for a county or state geography) represent time slices, which are easy to prepare for import (often just a matter of copying values and object IDs into a spreadsheet), but these datasets require additional steps during the import process to assign time stamps. Click here to see how these two different ways of handling time data are formatted for SpaceStat, and here to link to the two import tutorials (mentioned above) that demonstrate import of time slice and time series data.

Once you have decided whether to import as a time slice or time series, check on how dates, times, and locations need to be formatted, and then look here for help on the File import dialog box.

Coordinating time stamps across geographies and datasets

When working with temporal data, a meaningful time stamp must be provided for geographies during import, even if the spatial characteristics of the geography do not change over time (i.e., county or state boundary shapefiles or gdb files). This is because time stamps for geographies and datasets in your project will determine the start and end dates on your animation (time) slider. For objects to be "visualizable" (i.e., seen in maps) at the same time, they must have at least some overlap in their time stamps. Visualization of certain objects within the geographies and datasets in a given project will not be possible if they have very different time stamps that do not overlap.

For example, a geography is imported and defined with a very wide time slice (e.g., 1950 - 2000). If you then attempt to view datasets linked to objects in the geography that have a much narrower time slice (e.g., 1960 - 1965), a map of the geography would only appear as the animation runs through 1960-65; these polygons would appear blank through the rest of the animation.

If none of your data have a time component, the default import time format can be used for all files you bring in. In order to work with these files together, they must all receive the same stamp.

Modifying geographies by importing and merging a new file

Once you have an existing geography, you can import changes in shapes or positions by bringing in another shapefile, gdb file, text file, or Excel file with a different time stamp and merging it with an existing geography. In most cases, this can be accomplished with time slice data where the end of the first time slice (exclusive) is set to match the beginning (inclusive) of the time slice for the geography you are merging. For SpaceStat to allow objects to move or change shape, the objects must have the same ID in both geographies. The "Merge with existing geography" option will appear in the "Import method" menu within the "Import file" dialog box when you are working on projects for which you have already imported at least one geography.

Modifying geographies after import

There are also many ways to create geographies and datasets from those that have already been imported, as described in the "Create New Data" page. Briefly, tools for modifying geographies include the ability to create a subset of an existing geography, to create a centroid geography from a polygon geography, and to alter the spatial structure of a geography through a set of tools we group under "Scale Conversion/Interpolation" in the methods menu.

Modifying and creating datasets after import

There are many ways to modify or create new datasets within SpaceStat. The following two categories describe these methods.

1. products of techniques you can use as part of a visualization process or to prepare data for further analysis, and

2. products of a statistical analysis.

As part of a data preparation or visualization step, you may want to standardize datasets using the Z-score transformation, or smooth them using the Empirical Bayesian Smoother. Prior to conducting an analysis, you may need to modify an existing datasets using tone of the dataset calculator's functions (i.e., take an absolute value, or multiply by a constant).

The creation of new datasets is primarily achieved through the use of statistical methods. In some cases, the new datasets are the focal product of the analysis, and in others they help you to evaluate spatial patterns in your results (i.e., by mapping the residuals from a regression analysis).