A Permutation-based Test for Sequence Comparison

Abstract

Sequence analysis has seen recent advances as well as wider applications in the social sciences.  However, no formal way exists in the literature for comparing groups of sequences to determine whether they are different in a statistically meaningful way.  To fill this gap, in this project we propose a permutation test for comparing groups of social science sequences.  We view a typical social science sequence, such as life-courses as having certain characteristics such as transition to first marriage, first birth, or first job, that contribute some unique information. Therefore, in addition to proposing a permutation test for comparing overall sequence-group differences via sequence-based distance such as the Levenshtein distance or a group-based distance such as the overall mean inter medoid distance as a method for computing overall sequence differences, we propose to apply the permutation test on statistics that isolate specific aspects of sequences. Examples of such statistics include the relative frequency of transitions, the relative frequency of repeated events, and the timing of certain events. We apply the test to data from the German Life History Study (GLHS) on family formation of East and West German women.