Help
.. Preferences
This '# gnomap-version 1' will likely be changed/improved somewhat. What is here is a working format that is efficient to use with the map display program (java-based).
The fields now defined are:
# fly/features-2L.tsv # Features for fly from BDGP/Celera/FlyBase data [ 20-December-2000] # gnomap-version 1 # Feature gene map range id db_xref source fly Chr 2L 1..22082480 - gene Nhe1 - complement(98968..101637) MEOW:FBgn0026787 FlyBase:FBan0012178,GadFly:CT9263 - gene M(2)21AB - 102694..109351 MEOW:FBgn0005278 FlyBase:FBan0002674,GadFly:CT31903,GadFly:CT31899,GadFly:CT31907,GadFly:CT31901,GadFly:CT31790 - gene CG13694 - complement(110631..111036) MEOW:FBgn0031219 FlyBase:FBan0013694,GadFly:CT33151 - gene CG4822 - complement(111997..115661) MEOW:FBgn0031220 FlyBase:FBan0004822,GadFly:CT41972,GadFly:CT41960,GadFly:CT41970,GadFly:CT15205 -Value '-' or empty field indicates null value.
The general idea for this gnomap format is to capture all of a DDBJ/EMBL/GenBank feature table statement in one efficiently parseable line. A future format might require a 'merge' style tab-separated file where the first line is a set of field keys. These keys would be "Feature", "location", plus any set of qualifiers defined the same as D/E/G qualifier tagset of http://www.ncbi.nlm.nih.gov/collab/FT/index.html, but with some variation to suit the need for fixed columns. A qualifier field could be added/removed from table depending on need. The parser should recognize first non-comment as a list of field keys.
Alternately, the qualifiers could be catenated in a single field, with /key="value" ; /key2="value2" structure. Your comments are welcome.
# fly/features-2L.tsv # Features for fly from BDGP/Celera/FlyBase data [ 20-December-2000] # gnomap-version 2a Feature location gene map id db_xref note source 1..22082480 fly Chr 2L - gene complement(98968..101637) Nhe1 - MEOW:FBgn0026787 FlyBase:FBan0012178,GadFly:CT9263 - gene 102694..109351 M(2)21AB - MEOW:FBgn0005278 FlyBase:FBan0002674,GadFly:CT31903,GadFly:CT31899,GadFly:CT31907,GadFly:CT31901,GadFly:CT31790 - gene complement(110631..111036) CG13694 - MEOW:FBgn0031219 FlyBase:FBan0013694,GadFly:CT33151 - gene complement(111997..115661) CG4822 - MEOW:FBgn0031220 FlyBase:FBan0004822,GadFly:CT41972,GadFly:CT41960,GadFly:CT41970,GadFly:CT15205 -
# fly/features-2L.tsv # Features for fly from BDGP/Celera/FlyBase data [ 20-December-2000] # gnomap-version 2b Feature location qualifiers source 1..22082480 /organism=fly ; /chromosome=2L gene complement(98968..101637) /gene=Nhe1 ; /id=MEOW:FBgn0026787 ; /db_xref=FlyBase:FBan0012178,GadFly:CT9263 gene 102694..109351 /gene=M(2)21AB ; /id=MEOW:FBgn0005278 ; /db_xref=FlyBase:FBan0002674,GadFly:CT31903,GadFly:CT31899,GadFly:CT31907,GadFly:CT31901,GadFly:CT31790 -
With each feature-xxx.tsv is an associated feature-xxx.tsv.idx and feature-xxx.tsv.ranges. The feature-xxx.tsv.ranges consists of lines of
base-start | file-index OR class-name | file-index | locationwhere file-index indexes the feature-xxx.tsv file. It is used for efficient reading of subranges of features by the map display program.
The idmap.tsv table is a list of
ID | Chromosome | base-start..base-endfor all chromosomes. It can be used to look up location by ID.
The feature-xxx.tsv.idx and idmap.tsv.idx files are a byte-index into respective files of based on numeric ID (idvalue * 8). The value stored there is the record-offset, record-length (two 4 byte integers) in feature-xxx.tsv or idmap.tsv.
Don Gilbert --- March 2001