October 2001
Volume 5, Issue 1


Inside this Issue...

New Trend Anti-Virus Software

Windows 2000 Support on the SSC Network

New Teaching Room & Upgrades to Existing Ones

Staff News

mail.uwo.ca

Internet Data Library System (IDLS) Version 2

   

Social Science Computing Laboratory
Faculty of Social Science
Room 1228, Social Science Centre
The University of Western Ontario
London, Ontario, Canada, N6A 5C2

E-mail: ssts@uwo.ca
Web: www.ssc.uwo.ca/ssnds
Phone: 519 661-2152 
Managing Editor: Ramona Fudge

SSC Network Update

Internet Data Library System (IDLS) Version 2

Vince Gray

The Internet Data Library System, (IDLS) Version 2, is the third interface to machine readable data developed at the Social Science Computing Laboratory (SSCL) in the past sixteen years. Technically, it marks a departure from both prior systems: it is strictly a Web-based client/server system. Conceptually, it marks a leap forward, providing the user with the ability to search for data at the file, variable, or value level. Thus, for example, it is now possible for a user to easily identify files which have variables concerned with cocaine, whether in question wording or in responses to questions.

The new IDLS is based on Inmagic databases, which contain the metadata (i.e, documentation) about the files, variables, and values. Queries conducted on these databases in turn are used to generate a query on our SQL Server database, which stores the data in one or more tables per file. Finally, Visual Basic and Perl programs assemble user requests and return data and documentation over the Internet.

In order to load a file into IDLS, basic statistics are obtained for each variable (e.g., frequencies) using SPSS. Subsequently, the documentation (whether paper or electronic) is manipulated to fit into the Inmagic databases, in the appropriate fields, to make this information searchable.

The web address for IDLS is idls.ssc.uwo.ca. When first entering the system, you will be presented with the “Select by Data File Category” screen (see Figure 1). This screen is roughly comparable to the prior version of IDLS, which required you to select a category of data from which to select a file. Choosing any one of the options in the selection box will retrieve all files pertaining to that  subject or category. However, the system supports a far more sophisticated search.

Raw data files

The most sophisticated searches permitted by the new system are “Search by File Name”, “Search by File Description” or “Search by Variable Description”. Any of these three searches permit free text searching of fields. Full Boolean operations are permitted in constructing queries; “Help using IDLS” provides assistance with syntax. “Searching by file name” restricts the search to the name of the file itself. “Searching by file description” adds the abstract field to the list of fields searched – this effectively adds the bulk of the documentation about the file itself to the list of searchable material.

Thus, if Acquired Immune Deficiency Syndrome (AIDS) is mentioned in the abstract of the file, but not in the title of the file, it would be found with the file description search, but not by the file name search. Note, of course, that this increases the possibility that irrelevant files would be retrieved, due to the amount of information in many of the abstracts.

The result will be a list of files that corresponds to the search term. From this point, you can look at:

  1. the access controls on the file (who may use it, and how);
  2. the file description (i.e., documentation about the file itself);
  3. If available, additional documentation on the web, (e.g., PDF user’s guides, questionnaires); or
  4. Select variables from the file itself.

When you select variables from the file, you are presented with a list of all the variables in the data file; from that list, you may select the variables you require. Note that certain variables (such as Id number, and weights) will automatically be selected for you, in order to merge requests if you forget variables, or to make population estimates rather than analyse sample results.

If you have done a search on variable descriptions, the variables which led you to be interested in the file to begin with will also be selected automatically, so that you do not have to browse through the entire list to find them.

Information about each variable is only a click away: if you click on the variable label, the metadata which pertains to that variable will be displayed in a separate browser window.

Having selected your variables, you move to restrict the data which you retrieve, by selecting values of the variables retrieved (see Figure 2). Thus, you may decide to retrieve data on males in Ontario between the ages of 15 and 55 earning over $50,000 – but be warned: every restriction you impose works in conjunction with every other restriction. It is quite possible to wind up with an empty data file by imposing too many restrictions.

When you have imposed all needed restrictions, proceed to submit your request for data (see Figure 3). You must enter your e-mail address, so that we may notify you of the location of the data retrieved, and subsequently, if necessary, of any problems with the data.

 

 

 

You may select any or all of the data set description formats: SAS, SPSS, or Stata: each of these files will have commands to read the data file, along with variable labels, value labels, and missing value declarations. In each case, you will be required to explicitly identify within the data set description both where you have stored the data file, and its name. You are now able to begin your analysis.

Other files

For these files, there may or may not be an abstract; there will be no
variable information loaded into the system. Thus, they are not retrievable through the “Variable description” search option.

“Incomplete” raw data files

Not all raw data files loaded into IDLS have been completely loaded yet: in many cases, a file record is the only one that is available. If you need to use one of those files, you can request it through the “Request currently unavailable data” link.

Beyond 20/20 tables and Excel spreadsheets

Statistics Canada, and other data organizations, have been distributing tables of data in Beyond 20/20 format for several years now (including the 1996 Census of Canada). The Beyond 20/20 tables themselves are linked in IDLS, so you may search by name for the file, and then load it directly into Beyond 20/20. Additionally, a number of Excel spreadsheets have been linked in the system.

Maps

We have begun to load Mapinfo and Arcinfo maps into the system. These  may be found by name, and then downloaded to use on your local machine.

Web sites

Certain data web sites, such as CANSIM and the World Bank’s Poverty Monitoring Database, have been linked in the system. Thus, it is possible to locate those sites by searching within IDLS. Our intention is to add more sites as time permits.

Help

A Powerpoint walkthrough on IDLS is available by searching for the term IDLS in file name.