Phase-III Macro System auf der PhUSE Berlin 2010

Aus phenixxenia.org
Zur Navigation springen Zur Suche springen

A case of “fractal” system architecture using programming languages and procedures from SAS Institute Inc.

Phase-III Macro System

Author
WOLF-DIETER BATZ, Phenix-MTK GmbH 
Summary

The Phase-III Macro System is a flexible, data independent and parameter controlled set of SAS macros. Module size is kept small (three screen pages at maximum) for maintainability and avoids hard-coded references to any application related information like data types, labels and formats. Coding style makes broad use of automatic documentation and generation of meta data and lookup tables at runtime.

INTRODUCTION

The Phase-III Macro System is aimed at serving as a base for an extendable system that provides mechanisms for shaping input datasets, processing calculations and generating SAS datasets with ready made text content. The following requirements are met:

  • Produce a wide variety of output with a minimum set of modules.
  • Minimize maintenance efforts through self-documenting and limited program code.
  • Be prepared to add new output structures without substantial delay.

The Phase-III Macro System is a highly interactive collection of macro modules providing transformation methods for study emergent datasets making use of all the information available in the description part of the dataset processed.

It provides subroutines that care for data types, formats, labels, headers, missing values, loops and more. Runtime generated information used to control processing is kept in standardized data structures using macro variable lists (“mlists”), SAS formats and datasets.

The user is provided with (an) output dataset(s) containing character columns with standard names and externally controlled attributes. Finally the Phase-III Macro System provides pre- and post processing functionality such as condense, struct and missline.

SCOPE

The Requirements, Ideas, Architecture and Solution described here have been taken from the statistical programming part of a clinical study for which I was contracted a few years ago. As part of the excellent and precisely defined processes the study was based on, a booklet, the so-called table shell, served as unique reference for the total of all tabulation to be performed and later addressed to the health authorities for approval.

SUSTAINABLE APPROACH

The ordinary way to go would have been to immediately start programming in order to generate the first table according to the definition from the table shell and then continue one-by-one. But for some reason we did not do that. Maybe because the number of tables was too large to accomplish programming both, in-time and in high quality, or maybe because we were expecting a number of minor last-minute-changes to eat up more time than was available after database closure or, worse, after un-blinding.

REQUIREMENTS ANALYSIS

Instead, we started by investing some time to have a closer look at the table definitions. Not at all surprising, we identified a limited number of similarities and dissimilarities that resulted in two lists: The first one listed table structures found and the second one listed parameters. After that every single entry from the table-shell could be expressed as variation and combination from on one or more structures.

GRANULARITY

As a result from structural analysis, it was regarded most useful to have table definitions formulated as grid of super-cells. These are cells that contain one or more values. Super-cells aggregate horizontally to form super-rows. Super-rows aggregate vertically to form an output table.

CELL TYPES

Super-cell contents are managed on a character string level. Nevertheless, three categories are defined to reflect data types of original data reported. These are continuous variables (“cont”), categorial variables (“catv”) and boolean variables (“bool”).

IMPLEMENTATION

For each of these cell types SAS macros are coded to produce super-rows.

As can be seen from this example, a table is most likely to be comprised from several super-rows of different type and depth.

This term “depth” probably needs to be explained in more detail since it is crucial for the understanding of the entire logic used in the Phase-III Macro System.

DEPTH

Let’s accept for a moment the idea that complex structures may be generated by repeating simple structures plus adding other simple structures, that is, logical multiplication and addition. By iterating this operation we can deliberately increase complexity without any need for new functions aside multiplication and addition.

The architecture of the Phase-III Macro System is based on this concept and hence, is capable to produce a theoretically unlimited set of tables with theoretically unlimited complexity.

ARCHITECTURE

To provide tabulation for clinical studies the architecture described here was found to be totally sufficient:

  1. All tabulation is performed by User Modules.
  2. These make use of functions provided by the Core Modules. Core Modules generate data tables that could be output but would not look well formatted.
  3. Service Modules do not produce such output data but carry functions that are needed by Core Modules to work properly.
  4. Finally, Info Modules provide information that they obtain from the data dictionary and other repositories like dataset headers etc.

Quite obviously depth in software or system architecture supports the concepts of module reusability and system maintainability. Moreover they provide a means to limit module source code which in turn facilitates validation, error prevention and code maintenance.

CONTACT INFORMATION

Your comments and questions are valued and encouraged. Contact the author at:

Wolf Dieter Batz
Phenix-MTK GmbH
Wiesengrund 8
D-69234 Dielheim
+491772163609
+496222770095 (Fax)
batz@phenix-mtk.com
www.phenix-mtk.com

Brand and product names are trademarks of their respective companies.