![]() | I.M.A.G.E. Consortium | “Sharing resources to achieve a common goal — the discovery of all genes” |
GenBank records are entered in a loose format. While this allows for a great amount of freedom and facilitates the needs of many users it makes it difficult for computers to process the data. The uses the sequences stored in GenBank for many purposes including our Imagene clustering software and our QC efforts. There are certain important pieces of data we try to determine from a GenBank record. While we have tried to be as robust as possible when determining criteria for parsing a GenBank record our software relies on certain assumptions, which will be explained here.
LOCUS AA099559 436 bp mRNA EST 28-OCT-1996 DEFINITION zl78a03.s1 Stratagene colon (#937204) Homo sapiens cDNA clone IMAGE:510700 3' similar to gb:D11086 CYTOKINE RECEPTOR COMMON GAMMA CHAIN PRECURSOR (HUMAN);, mRNA sequence. ACCESSION AA099559 NID g1645633 VERSION AA099559.1 GI:1645633 KEYWORDS EST. SOURCE human. ORGANISM Homo sapiens Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. REFERENCE 1 (bases 1 to 436) AUTHORS Hillier,L., Lennon,G., Becker,M., Bonaldo,M.F., Chiapelli,B., Chissoe,S., Dietrich,N., DuBuque,T., Favello,A., Gish,W., Hawkins,M., Hultman,M., Kucaba,T., Lacy,M., Le,M., Le,N., Mardis,E., Moore,B., Morris,M., Parsons,J., Prange,C., Rifkin,L., Rohlfing,T., Schellenberg,K., Soares,M.B., Tan,F., Thierry-Meg,J., Trevaskis,E., Underwood,K., Wohldmann,P., Waterston,R., Wilson,R. and Marra,M. TITLE Generation and analysis of 280,000 human expressed sequence tags JOURNAL Genome Res. 6 (9), 807-828 (1996) MEDLINE 97044478 COMMENT Contact: Wilson RK Washington University School of Medicine 4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108 Tel: 314 286 1800 Fax: 314 286 1810 Email: est@watson.wustl.edu This clone is available royalty-free through LLNL ; contact the Hypothetical Example Institution (example@ResearchInstitution.gov) for further information. Seq primer: -40M13 fwd. from Amersham High quality sequence stop: 353. FEATURES Location/Qualifiers source 1..436 /organism="Homo sapiens" /db_xref="GDB:3843195" /db_xref="taxon:9606" /clone="IMAGE:510700" /clone_lib="Stratagene colon (#937204)" /lab_host="SOLR cells (kanamycin resistant)" /note="Organ: colon; Vector: pBluescript SK-; Site_1: EcoRI; Site_2: XhoI; Cloned unidirectionally. Primer: Oligo dT. T-84 colonic epithelial cell line. Average insert size: 1.0 kb; Uni-ZAP XR Vector; ~5' adaptor sequence: 5' GAATTCGGCACGAG 3' ~3' adaptor sequence: 5' CTCGAGTTTTTTTTTTTTTTTTTT 3'" BASE COUNT 118 a 72 c 153 g 91 t 2 others ORIGIN 1 tttttttgat gattatcaac agaaacttta tttctcatcg gttcaggaac aatcggaggg 61 tagatggaaa gaggaaggga gggaaagagg gagggaggaa gaatcctgcg aaaaggaagg 121 gccagactga gggagaagaa aaacatgttc ggggcaaaag ggtaattctc aagtggggaa 181 tgccaaatga aggggtgctt acatgggggc acaaaattcc aaatcagcca cagtggggtg 241 aggtgagtat gagacgcagg tgggttgaat gaaggaaagt tagtaccact tagggctaca 301 ggaccctggg gttcttcttg tcagaggatt gggggttcag gtttcaggct ttagggtgta 361 acattggggg ggcccagtta ggggctattg ctggttngca tggngggggg ccccaggccc 421 cctcccccaa gggccc //
This is where we determine our GenBank accession number, the record type and date to be "AA101995", "EST" and "28-OCT-1996" respectively. Currently The only processes records of type "EST" and "PRI".
This is where we determine the orientation of a clone by searching for the first occurence of 5' or 3'. In this case the EST is the 5 prime end.
IMAGE also makes note of how much of the sequence is considered poor quality by searching this field for the phrase "High quality sequence stop: ###" as in the example below:
High quality sequence stop: 353.
The most important part to us is this field.
/clone="IMAGE:510700"
In this one line we are able to determine our internal clone id and that this GenBank record describes an IMAGE clone. While we also try some secondary methods to determine this information this is the safest way to ensure that data regarding an IMAGE clone can be retrieved by IMAGE.
We also try to determine further data about a clone by searching the entire record for certain key phrases. The phrase ěconsidered poor qualityî marks a clone as low quality and the phrase ěreversed cloneî marks a clone as reversed.
Web page maintained by preston.hunter@asu.edu | Biological Questions and Comments to preston.hunter@asu.edu |