Research Blog
Posts tagged Code
Improvements
Jul 1st
TypeA
- Removed all motifs where both similar drugs bind to the target.
- 30,231 motifs were originally identified, now 27,499 (-2,732).
TypeB
- Removed motifs where the drug binds to both targets.
- 22,864 motifs were orignally identified, now 19,185 (-3,679).
TypeC
- Removed motifs where protein(1) inv_in disease(2) or protein(2) inv_in disease(1).
- 70,866 motifs were orignally identified in the dataset, now 70,715 (-151).
- Tried: Remove motifs where the target connects to protein(2), this did not take into account that some links are has_similar_sequence/structure and not is_a. 200 (-70,515).
- Removed motifs with target is_a protein2, made no difference to total.
Ondex
- Selecting a node with a certain id in Ondex?
- Too hard when there is 500k nodes
- Search doesn’t work
- Ondex Console: how to use this??? -> help is unavailable surprisingly.
Cleaning
Jun 20th
Cleaned up the coding.
- Removed deprecated stuff
- Added some comments
Also fixed the name servers.
Motif Definition 2
Jun 17th
The second motif that was defined has been completed in Java.
Example output from Java (cutoff dataset):
Code was ran on the updated dataset:
- There were 70,866 instances of this motif definition in the dataset.
- However, this doesn’t address the issue of the OMIM database. So many of these will not have true “Diseases”. This would probably be best addressed when trying to score this type, if its even possible to do this.
Previous Code
May 30th
The previous code used to count the abstract semantic motif within the data has been modified and extended so now it correctly functions. (i.e. coded in the latest version of java, new ondex functions, ondex deprecated methods etc…).
Testing gave :-
- 157 semantic motifs for the cutdown_set.xml.gz dataset.
- 26,693 sematnic motifs for the ib2010_data.xml.gz dataset.
Which is the same as what the paper stated, showing the newest code is working correctly.
