AllTheBacteria

assembly bacteria bioinformatics fasta genomic life sciences microbial genomics short read sequencing whole genome sequencing

Description

All bacterial isolate whole-genome sequencing data from INSDC, uniformly assembled, quality-controlled, annotated, and searchable.

Update Frequency

The current release is for all SRA bacterial isolate data up to August 2024. The colllection will be updated occasionally, with no fixed schedule.

License

MIT License

Documentation

https://allthebacteria.org

Managed By

European Bioinformatics Institute

See all datasets managed by European Bioinformatics Institute.

Contact

https://github.com/AllTheBacteria/AllTheBacteria/issues

How to Cite

AllTheBacteria was accessed on DATE from https://registry.opendata.aws/allthebacteria.

Usage Examples

Publications

Resources on AWS

  • Description
    Individual, compressed genome assemblies in .fasta format in a public S3 bucket.
    Resource type
    S3 Bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::allthebacteria-assemblies
    AWS Region
    eu-west-2
    AWS CLI Access (No AWS account required)
    aws s3 ls --no-sign-request s3://allthebacteria-assemblies/
  • Description
    Phylogenetically-compressed, batched xz archives of all genome assemblies in .fasta format in a public S3 bucket.
    Resource type
    S3 Bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::allthebacteria-phylogeneticbatches
    AWS Region
    eu-west-2
    AWS CLI Access (No AWS account required)
    aws s3 ls --no-sign-request s3://allthebacteria-phylogeneticbatches/
  • Description
    Metadata for each genome assembly, including taxonomic information, in a public S3 bucket.
    Resource type
    S3 Bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::allthebacteria-metadata
    AWS Region
    eu-west-2
    AWS CLI Access (No AWS account required)
    aws s3 ls --no-sign-request s3://allthebacteria-metadata/
  • Description
    A LexicMap index of all genome assemblies. This can be used for efficient sequence alignment against all genomes.
    Resource type
    S3 Bucket
    Amazon Resource Name (ARN)
    arn:aws:s3:::allthebacteria-lexicmap
    AWS Region
    eu-west-2
    AWS CLI Access (No AWS account required)
    aws s3 ls --no-sign-request s3://allthebacteria-lexicmap/

Edit this dataset entry on GitHub

Tell us about your project

Home