@timfduffy Against commonly accepted "wisdom", t-SNE normally converges to the same embedding up to rotation, regardless of initialization/seed. The key issue is that you need to reduce the exaggeration gradually to 1.0 during the training.
Scientists say they have made some of the first direct measurements of how long it takes an individual, ordinary protein to fold – and the results were surprising.
https://t.co/gs7UUNBLke
@hxiao The perplexity parameter for t-SNE is likely too small in this demo. If you choose the value 500, instead of the default value 50, the resulting embedding will have better cluster separation.
Ideally t-SNE embedding grows, during its learning process, from a tine spot to a short bar that then elongates, splits and expands to a cluster of various shapes with gradient boarder. The following is an embedding of 6K proteins based on their 3D coordinates:
t-SNE & Initialization. Because the use of exaggerated learning phase its embeddings normally converge to a single final map (upto rotaion). This makes dedicated initialization largely irrelevant. For this reason, VisuMap's implementation only supports random initialization.
t-SNE & learning-rate. A key feature of t-SNE is its use of adaptive-gains for each optimized variables. This make the learning-rate parameter largely irrelevant. For this reason, VisuMap's implementation has a fixed learning-rate (500), users don't have to care about it.
@old_crone_code@mudscryer t-SNE should converge to an unique embedding up to a rotation, if you choose a large enough perplexity; and reduce the exaggeration slowly from say 5.0 to 1.0. You can try the implementation of t-SNE in VisuMap.
Protein Atlas with t-SNE: We can vectorize amino-acid chains with Fourier-Transform and embed them with t-SNE into 2D space. Here is a map of ca. 100K AA-chains. The map reveals the similarity between the 3D structures.
UnFolding protein with t-SNE: The following video clip shows the 3D structure of a protein complex (6EMK, https://t.co/kdtUf451qH). The t-SNE embedding shows clearly the C2 symmetry between upper and lower halves.
t-SNE for un-folding protein 3D structure: When we add the sequential index as the forth dimension to the 3D coordinates of polypeptides, t-SNE can produce 2D maps which unfold their 3D structures; and reveal more sub-clusters with their distinguished shapes.
Apart from an overview about the whole 3D structure, a 2D map also captures local structures like alpha helixes; and large structures like symmetries and replicating patterns.
t-SNE for 2D protein maps: In order to facilitate the exploration of 3D folding structure of protein polymers, we can use t-SNE to create 2D maps for protein polymers. Here are the steps:
@MicTott With approperite parameters, t-sne mostly convergs to an unique embedding, upto rotations. If you get different embeddings with different initionalizations, it indicats rather that the settings of t-sne are not properly choosen.
@wats_updog@datepsych The main purpose of t-SNE is to preserve local and global distances/similarity, and it worked better than most other MDS algorithms. The sentence "it proves nothing" or "distance is meaningless" is meaningless itself; and not even-wrong.
@datepsych Most critics here are pretty unsubstantiated. It would be however helpful to find 3 to 5 features, which contribute mainly to the cluster separation. You can just examine the average values of small selected region; and move the region from on cluster to the another one.
T-sne was the true breakthrough from 3 dozens of nonlinear dim. reduction methods in 2008. bhSne, fastSne and umap are sad "optimizations" which ignore large distances, and made it more or less useless for complex data. https://t.co/cvzwYOonXC