Programming

Programming

Cutting down

70k~ seems a high amount for the TypeC motif.

This high number is due to the multiple disease involved with Protein(1) and multiple disease involved with Protein(2). Each of these will be seperate motif to the count.

  • Do not count motifs where Protein(2) is known to interact with Disease(1)//Protein(1) amd Disease(2) (Not Interested) — Only cut the results by ~200 70,510
  • Remove motifs where Target interacts with Protein(1) and Protein(2). — Cut the result to 200 motifs.
  • Only 200, as I only checked if Target was a connected to Protein(2), but there are more than just “is_a” that can connect these nodes such as “h_s_s”.

metagraphlog

Chlorpromazine Motif

Cross Checking

Linux(Ubuntu) was successfully installed on to the machine along with the latest version of Ondex. The large dataset seemed to work fine in Linux with 3.6gig of allowed memory for the program. I searched for Chlorpromazine with in the dataset to get node id’s of the semantic motif discussed in the paper.

  • Drug 1 – Chlorpromazine – ID:120943
  • Drug 2 – Trimeprazine – ID:120836
  • Target – Histamine H1 Receptor – ID:715

I then traversed the output of the java code in Ondex Mini, and sucessfully found the motif:

Chlorpromazine Motif

Definition of a more complex example

  • Started an implementation of a more complex motif i.e. look for similar proteins.
  • Developed it using the chart created on a previous post.
  • Seems to be working okay, although there are some issues with the Disease output and when to increment the motif counter/add to the motif Set.

2

Abstract Motifs

Developed a java class to represent a abstractMotif as described in the paper (has fields for drug1, drug2 and thepotential target for drug1 (see previous post)). Created a TreeSet (Allows automated sorting of a Set, currently sorting by drug1.getId()) to store motifs found within the dataset, these are added in the method when a target is discovered. These motifs can then be manipulated when trying to score this type of motif.

It would be a good idea to cross check this with Chlorpromazine, but I need to get Linux up and running properly in a dual-boot due to memory issues in Windows (also not being able to print concept names is a problem).

Wrote a toString() method for a nicer output for testing purposes:

Drug1 <--> Drug2 --> Target

I wanted the Concepts name for example Chlorpromazine:

Chlorpromazine <--> Trimeprazine --> Histamine H1 Receptor

However , the ondex API is not working for printing names? – so using getId()’s for now.

Output

Here is some current output of the basic semantic motif definition.

Differences

This new implementation is different to the previously developed code by Cockell, S.J. for the 2010 paper. However, in the cutoff_data.xml.gz 157 motifs were identified, the same as the using the other method. However, there is a difference when using the code on the larger dataset ib2010_data.xml.gz. Previously there were 26,693 motifs identified compared with 29,633 using the new implementation, a difference of 2,940.

Note

It should also be reminded that it was discovered in the proper dataset:

  • Drugs seem to have the ConceptClass “Comp:Compound“.
  • Rather than the ConceptClass “Compound“.
  • There are 70 of these, none of which seem to bind to a target.
  • There are over 8k concepts of the former, so I will use this.
metagraph

Chart

Graph of how the basic motifs for protein similarity and drug similarity look like, so I see names of conceptClass and more importantly relationType.

metagraph

abstract motifPotential solution (in progress) to this:-

Code appears to be working, traverses and prints out the information of the nodes (although not each indiviudal motif).

June 8

Today:

  • How to narrow down
  • Not all relations have data attached to them, for example blast or sesame.
  • GDS.getValue to get “Tanimoto” is object. Convert to double to compare to a double cutoff_value.
  • Check tanimoto before anything starts?, i.e. if statement in start() before compounds examinied. Used 1.0 as a cutoff, cutdown the amount of motifs found by 81%.