Hapi: Haplotype Inference

Hapi: Rapid Haplotype Inference for Nuclear Families

Hapi is a program that efficiently infers minimum recombinant and maximum likelihood haplotypes for nuclear families. The program is available to download for non-profit use from this site.

Contact Amy Williams with questions or bug reports.

Downloading Hapi

A compiled version of Hapi that runs on 64-bit x86 Linux is available below; versions for other architectures are available upon request. This program is non-profit use only. Contact Amy Williams for information about commercial licensing.

Download: hapi-1.03-x86_64.tgz

Using Hapi

Once downloaded, you can run the program hapi to see the following usage message. Note that Hapi uses the same input file formats as Merlin.

    me@myhost$ ./hapi-[mr/ml]
    Usage: hapi [OPTIONS] [marker list file] [map file] [pedigree file]
    
    Options:
      -l, --log <filename>                  log to file <filename>
      -d, --data-analysis                   run data analysis
      --print-fams-trunc <fileprefix>       print truncated pedigree file and quit
      --print-trans-homologs <min fam children>     print CSV file with transmitted
                                                    homologs for each child,
                                                    for specified family size
      --print-haplotypes <min fam children> print CSV file with haplotypes
                                            for specified family size
      --print-text                          print either transmitted homologs or
                                            haplotypes in text format (not CSV)
    
    
      --print-all-trans-homologs            print all transmitted homologs
                                            for all families to one large CSV file
                                            named 'all-trans-homologs.csv'

Required arguments are listed in brackets. These files are in the same format that Merlin uses. We describe each below. Note: the distribution of Hapi includes a simple example with each of these files.

Marker List File

This file lists the markers that occur in the pedigree file in the order they appear. Each line gives information about one marker. There are two columns: the first column is always an 'M' character (i.e., marker), and the second column is the name of the marker. For example, if the dataset contains three markers named rs1, rs2, and rs3, the marker list file would look like the following:

    M rs1
    M rs2
    M rs3

Note that the names of the markers need not be rs id numbers, but can be any sequence of characters. These names must be the as those listed in the map file. Also note that the order of the markers has no physical meaning and can be arbitrary (thus markers on different chromosomes can occur intermixed, if desired). The map file specifies where the markers reside physically.

Map File

The map file contains the genetic map that Hapi uses to perform its haplotyping calculations. Each line defines the location of one marker, and Hapi requires that each marker appear in ascending order as defined by its centiMorgan position. There are three columns on each line. The first column lists the chromosome number, with numbers 23 and 24 encoding the X and Y chromsomes, respectively. The second column gives the marker name (the same as in the marker list file). The third column lists the position of the marker in centiMorgans. Note that the map file can contain information about markers than are not included in the marker list and pedigree files, but there must be an entry for each marker in the marker list file.

The following is an example map file, continuing with the above example:

    1  rs1  1.00
    1  rs3  1.356
    1  rs2  1.895

The above example specifies that rs1, rs2, and rs3, appear on chromsome 1 with rs3 appearing between rs1 and rs2. These markers are tightly linked, spanning a distance of 0.895 cM.

Pedigree File

The pedigree file contains genotype information about each sample individual as well as the relationships information within the pedigree. Note that Hapi currently applies to nuclear families and analyzes the nuclear families within each pedigree separately. This means that an individual that Hapi performs haplotyping twice for an individual that is a parent in one family and a child in another.

Each line contains five required columns plus SNP genotypes for each of the markers listed in the "marker list file". The first column lists a name for the family of the individual; Hapi ignores this character string (note: if this is problematic for your application, contact me). The second column gives a numerical identifier for the individual within the pedigree; this value must be positive and non-zero. The third column lists the numerical identifier for the individual's father, and the fourth column lists the identifier for the mother. Use a '0' or 'x' character to designate an unknown father or mother. The fifth column lists the person's gender, with a 1 for males, 2 for females, and 0 for unknown. The remainder of the line contains the marker genotypes in the same order as in the marker list file. The genotypes must be numerical identifiers and separate alleles for a given locus can either be separated by a '/' or by a space. Alleles coded as 0 are reserved for missing data.

The following is an example pedigree file for a nuclear family with three children that includes genotypes for three loci:

    fam1   10   x     x     1      1/1   3/4   2/1
    fam1   12   x     x     2      2/1   3/4   2/2
    fam1   21   10    12    1      1/1   3/3   1/2
    fam1   22   10    12    1      1/2   3/4   1/2
    fam1   23   10    12    2      1/1   3/4   2/2

Person 10 is the father, 12 is the mother, and 21, 22, and 23 are the children. Child 21 and 22 are male and 23 is female

If desired, a comment may appear on a line by itself if a '#' occurs as the first character. For example:

    # This is a comment and is ignored
    fam1   10   x     x     1      1/1   3/4   2/1
    ...

Contact Amy Williams with questions or bug reports.