LuceGene search command-line use notes --- August 2004 ============================================================ Command line calls to the lucegene search shell are same as interactive search commands. Use ';' as end-of-line to separate commands, and do searches, library info, formatting, retrievals from command line. With the shell script lucegene-search.sh, you can list several commands to process either way: -c 'format xml' -c 'setpage 10' -c 'find name:apple' -c 'format xml;setpage 10;find name:apple' The command syntax is not stable and will likely change, though most likely to add commands. These commnands probably are stable: directory -- list of libs library [name] -- lib info find {querystring} lookup lib id ; lookup lib field term -- find and return single match list terms field:[partialword] -- dictionary list format [format type] fields [output field list] setpage {pagesize} -- the default non-interactive pagesize is unlimited (all) The dbs/lucegene/libname.properties files set search defaults like format (table) as well as index settings. The bin/lucegene-search.sh shell script does some default settings like finding index and dbs/lucegene folders. See also http://www.gmod.org/lucegene/ http://www.gmod.org/lucegene/lucegene-readme.txt http://www.gmod.org/lucegene/lucegene-index-example.txt == Example command line use ======================================= bin/lucegene-search.sh -c 'help' -- list libraries in index directory bin/lucegene-search.sh -c 'directory' -- list library info w/ fields, value ranges bin/lucegene-search.sh -lib seqs -c 'library' -- list terms and counts in a field of a lib bin/lucegene-search.sh -lib seqs -c 'list terms type:' bin/lucegene-search.sh -lib seqs -c 'list terms name:adk' -- basic two field search with default results bin/lucegene-search.sh -lib seqs -c 'find type:gene +name:en' docid docclass name loc len url CG9015 gene en 2R:complement(6588717..6592923) 4207 dmel_2R_gene_r3.2.1.fasta,4215187-4219672 bin/lucegene-search.sh -lib seqs -c 'format native;find type:gene +name:en' >CG9015 type=gene; loc=2R:complement(6588717..6592923); ID=CG9015; name=en; dbxref=FlyBase:FBan0009015,FlyBase:FBgn0000577,FlyBase:FBgn0003492,FlyBase:FBgn0014157,FlyBase:FBgn0016969; len=4207 GAGGGAGCGAGCGAGAGAGCGCTCTGGCCAGCTAATAGGAGTGAGTGAGC CGGCGAAACCGGTTCGCATGGGGCAGGTGACAAGGCTAAGAGAGAGCGAA ... bin/lucegene-search.sh -lib fban -c 'find GSYM:toy' docid ARM BLOC.start SCAF GSYM ID FBan0011186 4 1009180 AE003846 toy FBgn0019650;FBan0011186 bin/lucegene-search.sh -lib fban -c 'format text;fields ID,GSYM,SYM,SCAF,ARM;find GSYM:toy' # doc i=0 ID FBgn0019650 ID FBan0011186 GSYM toy SYM CG11186 SCAF AE003846 ARM 4 ====== here is cute example of phrase search in reference abstracts ====== fbrf.acode and fbrf.abstr.acode indexed together ... could merge medline.xml abst? bin/lucegene-search.sh -verbose=1 -c 'library fbrf;set page 2' \ -c 'find "cytoplasmic myosin II"; get 0-1' # pagesize=2 # Search for: all:"cytoplasmic myosin ii" # Match 2 of 163560 documents ; 531 ms search time docid title url FBrf0144488 Asymmetric cell division in the embryonic CNS. FBrf.abstr.acode,49556-52004 FBrf0146376 Cortical recruitment of myosin II is regulated by Cdc2 kinase in Drosophila syncytial embryo. FBrf.abstr.acode,8353432-8355840 Using the shell script lucegene-search.sh, you can list several commands to process. bin/lucegene-search.sh -lib gnomap_dmelhet \ -c "format xml" \ -c "directory" \ -c "list terms chr:" \ -c "setpage 5" \ -c "fields all" \ -c "find chr:2h +range.start:[100000 200000] +range.stop:[100000 200000]"\ -c "format table" \ -c "fields feature chr range.start range.stop id name" \ -c "find chr:3h +range.stop:[200000 999999] +range.start:[000000 300000]"\ |& more There is now a default library 'libs' which just indexes dbs/lucegene/properties bin/lucegene-search.sh -c 'find a*' docid title url gamexml FlyBase Annotation GAME XML gamexml1.properties,0-8043 gamexml FlyBase Annotation GAME XML gamexml.properties,0-8744 fban FlyBase Gene Annotation fban.properties,0-3921 fbgnr FlyBase Genes fbgnr.properties,0-4198 fbgnx FlyBase Genes fbgnx.properties,0-4269 libs fbgn FlyBase Genes fbgn.properties,0-4206 fbrf fbrf.properties,0-2482 docs FlyBase Documents docs.properties,0-4235 ugpxml FlyBase Unified Gene Page XML ugpxml.properties,0-6966 seqs FlyBase Sequences seqs.properties,0-4889 fbgns FlyBase Genes fbgnslim.properties,0-3882 docs FlyBase Documents pdf.properties,0-2328 gnomap FlyBase Genome Feaures gnomap.properties,0-3343 go Gene Ontology go.properties,0-3803 libs FlyBase Lucegene Databases libs.properties,0-1709 table table.properties,0-3005 bin/lucegene-search.sh -c 'help' Query help: term(s) ; fieldname:term ; [+/-] - precede to require/prohibit ; 'all:terms(s)' to search all fields e.g., term AND w?ldc*rds OR 'phrase here' e.g., (filename:query) +(contents:query) -(description:query) 'search {querystring}' 'explain 1' to explain match of doc 1 'get 1 or 5-50' to view full doc(s) 'list 1 or 5-50' to view doc(s) fields 'list terms {field}' to list term counts in field index Directory commands: directory, library name, lookup lib id, lookup lib field value format, format name -- get/set output format fields, fields flda,fldb -- get/set output fields setpage 10 set page size next next page 'help' -- this doc 'quit' -- stop bin/lucegene-search.sh -c 'directory' # doc i=0 library ugpxml library table library seqs library libs library go library gnomap library gamexml library fbrf library fbgn library fban library dummy library docs count 12 title FlyBase Lucegene Databases docid directory bin/lucegene-search.sh -lib seqs -c 'list terms type:' Terms in dictionary for 'type:' start=0 for total docs 124305 #docs Field:Value 18747 type:cds 18621 type:five_prime_utr 13472 type:gene 16153 type:intron 65 type:miscrna 40 type:pseudogene 96 type:rrna 28 type:snorna 28 type:snrna 18590 type:three_prime_utr 19307 type:transcript 17164 type:translation 1572 type:transposable_element 288 type:trna found=14