Derivative data (datasets derived from primary data such as blast databases, fasta sequences for coding sequences, analysis results) take up 75TB of storage These are currently cached for about 1 month and files not used after that are deleted (don't worry, they get regenerated).
Some facts about CoGe (since I needed this information):
Relational Database: Mysql
Contains over 8,000,000,000 rows
Total size 2.2TB
Only contains metadata about genomes (e.g., organism, genomic features such as genes, etc.) Has no primary sequence, variant, RNASeq, MethySeq..
CoGe is back up!!!
Database was updated successfully. My preliminary set of sets (load genome, load annotations, SynMap, GEvo, and CoGeBlast) all worked without any issues. Please email [email protected] if you run into any problems.
Last table is still updating. Also did a double check and found one more table that needed to be update . I'm hoping that tomorrow will be the day CoGe comes back. If not, I will be checking over the weekend and will work to get CoGe up as soon as possible.
CoGe tables update completed! I'm moving the database to a dedicated partition (don't want it to fill due other system activity). Hoping to start testing in the next hour or two.
Progress report on CoGe: 4/5 tables have been updated to BIGINT for feature_ids. One table left. I've been running these updates in parallel, so am hopeful that the final update will be finished sometime today.
Database backups completed. Started the table updates last night, but saw that space was going to be limiting on the partition holding the database. A copy of the database has been installed on a larger partition and is currently having the first table updated.
We had some additional difficulty due to using CyVerse's LDAP for authentication, which also went down (both systems have no problems for years and crash at the same time -- how do they know?)