<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Semantic Motif Hunting &#187; Programming</title>
	<atom:link href="http://bio.adking.co.uk/category/programming/feed/" rel="self" type="application/rss+xml" />
	<link>http://bio.adking.co.uk</link>
	<description>Research Blog</description>
	<lastBuildDate>Wed, 06 Oct 2010 22:04:10 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Improvements</title>
		<link>http://bio.adking.co.uk/2010/07/01/improvements/</link>
		<comments>http://bio.adking.co.uk/2010/07/01/improvements/#comments</comments>
		<pubDate>Thu, 01 Jul 2010 11:35:22 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Project]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[improvements]]></category>
		<category><![CDATA[Motifs]]></category>

		<guid isPermaLink="false">http://bio.adking.co.uk/?p=185</guid>
		<description><![CDATA[TypeA Removed all motifs where both similar drugs bind to the target. 30,231 motifs were originally identified, now 27,499 (-2,732). TypeB Removed motifs where the drug binds to both targets. 22,864 motifs were orignally identified, now 19,185 (-3,679). TypeC Removed motifs where protein(1) inv_in disease(2) or protein(2) inv_in disease(1). 70,866 motifs were orignally identified in]]></description>
			<content:encoded><![CDATA[<p><strong>TypeA</strong></p>
<ul>
<li>Removed all motifs where both similar drugs bind to the target.</li>
<li>30,231 motifs were originally identified, now <strong>27,499</strong> (-2,732).</li>
</ul>
<p><strong>TypeB</strong></p>
<ul>
<li>Removed motifs where the drug binds to both targets.</li>
<li>22,864 motifs were orignally identified, now <strong>19,185</strong> (-3,679).</li>
</ul>
<p><strong>TypeC</strong></p>
<ul>
<li>Removed motifs where protein(1) inv_in disease(2) or protein(2) inv_in disease(1).</li>
<li>70,866 motifs were orignally identified in the dataset, now <strong>70,715</strong> (-151).</li>
<li>Tried: Remove motifs where the target connects to protein(2), this did not take into account that some links are has_similar_sequence/structure and not is_a. <strong>200</strong> (-70,515).</li>
<li>Removed motifs with target is_a protein2, made no difference to total.</li>
</ul>
<p><strong>Ondex</strong></p>
<ul>
<li>Selecting a node with a certain id in Ondex?</li>
<li>Too hard when there is 500k nodes</li>
<li>Search doesn&#8217;t work</li>
<li>Ondex Console: how to use this??? -&gt; help is unavailable surprisingly.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://bio.adking.co.uk/2010/07/01/improvements/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cutting down</title>
		<link>http://bio.adking.co.uk/2010/06/30/cutting-down/</link>
		<comments>http://bio.adking.co.uk/2010/06/30/cutting-down/#comments</comments>
		<pubDate>Wed, 30 Jun 2010 13:25:55 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://bio.adking.co.uk/?p=173</guid>
		<description><![CDATA[70k~ seems a high amount for the TypeC motif. This high number is due to the multiple disease involved with Protein(1) and multiple disease involved with Protein(2). Each of these will be seperate motif to the count. Do not count motifs where Protein(2) is known to interact with Disease(1)//Protein(1) amd Disease(2) (Not Interested) &#8212; Only]]></description>
			<content:encoded><![CDATA[<p>70k~ seems a high amount for the TypeC motif.</p>
<p>This high number is due to the multiple disease involved with Protein(1) and multiple disease involved with Protein(2). Each of these will be seperate motif to the count.</p>
<ul>
<li>Do not count motifs where Protein(2) is known to interact with Disease(1)//Protein(1) amd Disease(2) (Not Interested) &#8212; Only cut the results by ~200 <strong>70,510</strong></li>
<li>Remove motifs where Target interacts with Protein(1) and Protein(2). &#8212; Cut the result to <strong>200</strong> motifs.</li>
<li>Only 200, as I only checked if Target was a connected to Protein(2), but there are more than just &#8220;is_a&#8221; that can connect these nodes such as &#8220;h_s_s&#8221;.</li>
</ul>
<p><a href="http://bio.adking.co.uk/wp-content/uploads/2010/06/p_p_similar.png"><img class="aligncenter size-full wp-image-82" title="p_p_similar" src="http://bio.adking.co.uk/wp-content/uploads/2010/06/p_p_similar.png" alt="metagraph" width="422" height="218" /><a href="http://bio.adking.co.uk/wp-content/uploads/2010/06/loggy.txt">log</a></a></p>
]]></content:encoded>
			<wfw:commentRss>http://bio.adking.co.uk/2010/06/30/cutting-down/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mare</title>
		<link>http://bio.adking.co.uk/2010/06/28/146/</link>
		<comments>http://bio.adking.co.uk/2010/06/28/146/#comments</comments>
		<pubDate>Mon, 28 Jun 2010 09:58:02 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Paper]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Project]]></category>
		<category><![CDATA[plan]]></category>

		<guid isPermaLink="false">http://bio.adking.co.uk/?p=146</guid>
		<description><![CDATA[Motif Finder Exported the datasets ready for using with software (motif finder) that searches for motifs that are overrepresented. (Hypergeometric tests: 1day work). Cytoscape has a plugin. ib_2010_data.1.xml.gz &#8211;&#62; .dot (50mb) ib_2010_data.1.xml.gz &#8211;&#62; graph htm??? (cant remeber but it was 450mb) Motif Type C Trying to define another motif, struggling&#8230; similar targets? Plan Results/Code/Other program]]></description>
			<content:encoded><![CDATA[<p><strong>Motif Finder</strong></p>
<p>Exported the datasets ready for using with software (motif finder) that searches for motifs that are overrepresented. (Hypergeometric tests: 1day work). Cytoscape has a plugin.</p>
<ul>
<li>ib_2010_data.1.xml.gz &#8211;&gt; .dot (50mb)</li>
<li>ib_2010_data.1.xml.gz &#8211;&gt; graph htm??? (cant remeber but it was 450mb)</li>
</ul>
<p><strong>Motif Type C</strong></p>
<p>Trying to define another motif, struggling&#8230; similar targets?</p>
<div id="attachment_170" class="wp-caption aligncenter" style="width: 211px"><a href="http://bio.adking.co.uk/wp-content/uploads/2010/06/MotifB.png"><img src="http://bio.adking.co.uk/wp-content/uploads/2010/06/MotifB.png" alt="B" title="MotifB" width="201" height="178" class="size-full wp-image-170" /></a><p class="wp-caption-text">B</p></div>
<p><strong>Plan</strong></p>
<p><strong>Results/Code/Other program (June-July 2010)<br />
</strong></p>
<p>4 Weeks</p>
<ul>
<li>Week 1 &#8211; Code/Motif Finder (i.e. Motif A,B,C&#8230; x,y,z instances)</li>
<li>Week 2 &#8211; Java/Score/FoundExamples</li>
<li>Week 3 &#8211; Score</li>
<li>Week 4 &#8211; Score</li>
</ul>
<ul></ul>
<p><strong>Paper (July-August 2010)</strong></p>
<ul>
<li>Intro</li>
<li>Background</li>
<li>Methods</li>
<p>- Definitions of S.M.<br />
- Searching<br />
- Scoring</p>
<li>Results
<ul>
<li>TypeA</li>
<li>TypeB</li>
<li>TypeC</li>
</ul>
</li>
<li>Discussion
<ul>
<li>Future work</li>
</ul>
</li>
<li>Abstract</li>
</ul>
<p><a title="Webpage" href="https://internal.cs.ncl.ac.uk/modules/2009-10/csc8399/project_deliverables.html" target="_blank">Module Webpage</a></p>
]]></content:encoded>
			<wfw:commentRss>http://bio.adking.co.uk/2010/06/28/146/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Cleaning</title>
		<link>http://bio.adking.co.uk/2010/06/20/cleaning/</link>
		<comments>http://bio.adking.co.uk/2010/06/20/cleaning/#comments</comments>
		<pubDate>Sun, 20 Jun 2010 21:11:54 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Java]]></category>

		<guid isPermaLink="false">http://bio.adking.co.uk/?p=132</guid>
		<description><![CDATA[Cleaned up the coding. Removed deprecated stuff Added some comments Also fixed the name servers.]]></description>
			<content:encoded><![CDATA[<p>Cleaned up the coding.</p>
<ul>
<li>Removed deprecated stuff</li>
<li>Added some comments</li>
</ul>
<p>Also fixed the name servers.</p>
]]></content:encoded>
			<wfw:commentRss>http://bio.adking.co.uk/2010/06/20/cleaning/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Motif Definition 2</title>
		<link>http://bio.adking.co.uk/2010/06/17/motif-definition-2/</link>
		<comments>http://bio.adking.co.uk/2010/06/17/motif-definition-2/#comments</comments>
		<pubDate>Thu, 17 Jun 2010 14:08:58 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Ondex]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Project]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[motif]]></category>
		<category><![CDATA[protein]]></category>

		<guid isPermaLink="false">http://bio.adking.co.uk/?p=125</guid>
		<description><![CDATA[The second motif that was defined has been completed in Java. Example output from Java (cutoff dataset): Code was ran on the updated dataset: There were 70,866 instances of this motif definition in the dataset. However, this doesn&#8217;t address the issue of the OMIM database. So many of these will not have true &#8220;Diseases&#8221;. This]]></description>
			<content:encoded><![CDATA[<p>The second motif that was defined has been completed in Java.</p>
<p><a href="http://bio.adking.co.uk/wp-content/uploads/2010/06/p_p_similar.png"><img class="aligncenter size-full wp-image-82" title="p_p_similar" src="http://bio.adking.co.uk/wp-content/uploads/2010/06/p_p_similar.png" alt="metagraph" width="422" height="218" /></a>Example output from Java (cutoff dataset):</p>
<p><a href="http://bio.adking.co.uk/wp-content/uploads/2010/06/newAbstractBig.jpg"><img class="aligncenter size-full wp-image-126" title="proteintoprotein" src="http://bio.adking.co.uk/wp-content/uploads/2010/06/newAbstractBig.jpg" alt="Protein-Protein-Simularity" width="259" height="262" /></a></p>
<p>Code was ran on the updated dataset:</p>
<ul>
<li>There were <strong>70,866</strong> instances of this motif definition in the dataset.</li>
<li>However, this doesn&#8217;t address the issue of the OMIM database. So many of these will not have true &#8220;Diseases&#8221;. This would probably be best addressed when trying to score this type, if its even possible to do this.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://bio.adking.co.uk/2010/06/17/motif-definition-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cross Checking</title>
		<link>http://bio.adking.co.uk/2010/06/15/cross-checking/</link>
		<comments>http://bio.adking.co.uk/2010/06/15/cross-checking/#comments</comments>
		<pubDate>Tue, 15 Jun 2010 21:35:49 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Project]]></category>

		<guid isPermaLink="false">http://bio.adking.co.uk/?p=93</guid>
		<description><![CDATA[Linux(Ubuntu) was successfully installed on to the machine along with the latest version of Ondex. The large dataset seemed to work fine in Linux with 3.6gig of allowed memory for the program. I searched for Chlorpromazine with in the dataset to get node id&#8217;s of the semantic motif discussed in the paper. Drug 1 &#8211;]]></description>
			<content:encoded><![CDATA[<p>Linux(Ubuntu) was successfully installed on to the machine along with the latest version of Ondex. The large dataset seemed to work fine in Linux with 3.6gig of allowed memory for the program. I searched for Chlorpromazine with in the dataset to get node id&#8217;s of the semantic motif discussed in the paper.</p>
<ul>
<li>Drug 1 &#8211; Chlorpromazine &#8211; ID:120943</li>
<li>Drug 2 &#8211; Trimeprazine &#8211; ID:120836</li>
<li>Target &#8211; Histamine H1 Receptor &#8211; ID:715</li>
</ul>
<p>I then traversed the output of the java code in Ondex Mini, and sucessfully found the motif:</p>
<p><a href="http://bio.adking.co.uk/wp-content/uploads/2010/06/chlor.jpg"><img class="aligncenter size-medium wp-image-106" title="Chlorpromazine Motif" src="http://bio.adking.co.uk/wp-content/uploads/2010/06/chlor-300x168.jpg" alt="Chlorpromazine Motif" width="300" height="168" /></a></p>
<p><strong>Definition of a more complex example</strong></p>
<ul>
<li>Started an implementation of a more complex motif i.e. look for similar proteins.</li>
<li>Developed it using the chart created on a previous post.</li>
<li>Seems to be working okay, although there are some issues with the Disease output and when to increment the motif counter/add to the motif Set.</li>
</ul>
<p><a href="http://bio.adking.co.uk/wp-content/uploads/2010/06/bigger.jpg"><img class="aligncenter size-full wp-image-117" title="ExampleOutput" src="http://bio.adking.co.uk/wp-content/uploads/2010/06/bigger.jpg" alt="" width="540" height="243" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://bio.adking.co.uk/2010/06/15/cross-checking/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Abstract Motifs</title>
		<link>http://bio.adking.co.uk/2010/06/14/abstract-motifs/</link>
		<comments>http://bio.adking.co.uk/2010/06/14/abstract-motifs/#comments</comments>
		<pubDate>Mon, 14 Jun 2010 15:00:34 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Ondex]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Project]]></category>

		<guid isPermaLink="false">http://bio.adking.co.uk/?p=90</guid>
		<description><![CDATA[Developed a java class to represent a abstractMotif as described in the paper (has fields for drug1, drug2 and thepotential target for drug1 (see previous post)). Created a TreeSet (Allows automated sorting of a Set, currently sorting by drug1.getId()) to store motifs found within the dataset, these are added in the method when a target]]></description>
			<content:encoded><![CDATA[<p>Developed a java class to represent a abstractMotif as described in the paper (has fields for drug1, drug2 and thepotential target for drug1 (see previous post)). Created a TreeSet (Allows automated sorting of a Set, currently sorting by <code>drug1.getId()</code>) to store motifs found within the dataset, these are added in the method when a target is discovered. These motifs can then be manipulated when trying to score this type of motif.</p>
<p>It would be a good idea to cross check this with Chlorpromazine, but I need to get Linux up and running properly in a dual-boot due to memory issues in Windows (also not being able to print concept names is a problem).</p>
<p>Wrote a toString() method for a nicer output for testing purposes:</p>
<p><code>Drug1 &lt;--&gt; Drug2 --&gt; Target</code></p>
<p>I wanted the Concepts name for example Chlorpromazine:</p>
<p><code>Chlorpromazine &lt;--&gt; Trimeprazine --&gt; Histamine H1 Receptor</code></p>
<p>However , the ondex API is not working for printing names? &#8211; so using getId()&#8217;s for now.</p>
<p><strong>Output</strong></p>
<p>Here is some current output of the basic semantic motif definition.</p>
<p><a href="http://bio.adking.co.uk/wp-content/uploads/2010/06/2.jpg"><img class="aligncenter size-medium wp-image-108" title="2" src="http://bio.adking.co.uk/wp-content/uploads/2010/06/2-300x271.jpg" alt="" width="300" height="271" /></a></p>
<p><strong>Differences</strong></p>
<p>This new implementation is different to the previously developed code by Cockell, S.J. for the 2010 paper. However, in the <code>cutoff_data.xml.gz</code> 157 motifs were identified, the same as the using the other method. However, there is a difference when using the code on the larger dataset <code>ib2010_data.xml.gz</code>. Previously there were 26,693 motifs identified compared with 29,633 using the new implementation, a difference of <strong>2,940</strong>.</p>
<p><strong>Note</strong></p>
<p>It should also be reminded that it was discovered in the proper dataset:</p>
<ul>
<li>Drugs seem to have the ConceptClass &#8220;<code>Comp:Compound</code>&#8220;.</li>
<li>Rather than the ConceptClass &#8220;<code>Compound</code>&#8220;.</li>
<li>There are 70 of these, none of which seem to bind to a target.</li>
<li>There are over 8k concepts of the former, so I will use this.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://bio.adking.co.uk/2010/06/14/abstract-motifs/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Dustaride</title>
		<link>http://bio.adking.co.uk/2010/06/13/dustaride/</link>
		<comments>http://bio.adking.co.uk/2010/06/13/dustaride/#comments</comments>
		<pubDate>Sun, 13 Jun 2010 09:01:51 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Ondex]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Test]]></category>
		<category><![CDATA[Output]]></category>
		<category><![CDATA[Testing]]></category>

		<guid isPermaLink="false">http://bio.adking.co.uk/?p=110</guid>
		<description><![CDATA[Motif&#8217;s identified by traversing the java output and finding the relevant nodes in the Ondex suite:]]></description>
			<content:encoded><![CDATA[<p>Motif&#8217;s identified by traversing the java output and finding the relevant nodes in the Ondex suite:</p>
<p style="text-align: center;"><a href="http://bio.adking.co.uk/wp-content/uploads/2010/06/dustaride-motif.png"><img class="aligncenter size-medium wp-image-113" title="dustaride motif" src="http://bio.adking.co.uk/wp-content/uploads/2010/06/dustaride-motif-300x135.png" alt="" width="300" height="135" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://bio.adking.co.uk/2010/06/13/dustaride/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Chart</title>
		<link>http://bio.adking.co.uk/2010/06/10/chart/</link>
		<comments>http://bio.adking.co.uk/2010/06/10/chart/#comments</comments>
		<pubDate>Thu, 10 Jun 2010 10:29:44 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Project]]></category>

		<guid isPermaLink="false">http://bio.adking.co.uk/?p=81</guid>
		<description><![CDATA[Graph of how the basic motifs for protein similarity and drug similarity look like, so I see names of conceptClass and more importantly relationType. Potential solution (in progress) to this:- Code appears to be working, traverses and prints out the information of the nodes (although not each indiviudal motif).]]></description>
			<content:encoded><![CDATA[<p>Graph of how the basic motifs for protein similarity and drug similarity look like, so I see names of conceptClass and more importantly relationType.</p>
<p><a href="http://bio.adking.co.uk/wp-content/uploads/2010/06/p_p_similar.png"><img class="aligncenter size-full wp-image-82" title="p_p_similar" src="http://bio.adking.co.uk/wp-content/uploads/2010/06/p_p_similar.png" alt="metagraph" width="422" height="218" /></a></p>
<p><a href="http://bio.adking.co.uk/wp-content/uploads/2010/06/small-mot.png"><img class="aligncenter size-full wp-image-84" title="abstract motif" src="http://bio.adking.co.uk/wp-content/uploads/2010/06/small-mot.png" alt="abstract motif" width="259" height="199" /></a>Potential solution (<em>in progress</em>) to this:-</p>
<p><a href="http://bio.adking.co.uk/wp-content/uploads/2010/06/plan.png"><img class="aligncenter size-medium wp-image-86" title="plan" src="http://bio.adking.co.uk/wp-content/uploads/2010/06/plan-300x165.png" alt="" width="300" height="165" /></a></p>
<p>Code appears to be working, traverses and prints out the information of the nodes (although not each indiviudal motif).</p>
]]></content:encoded>
			<wfw:commentRss>http://bio.adking.co.uk/2010/06/10/chart/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>June 8</title>
		<link>http://bio.adking.co.uk/2010/06/08/june-8/</link>
		<comments>http://bio.adking.co.uk/2010/06/08/june-8/#comments</comments>
		<pubDate>Tue, 08 Jun 2010 13:19:34 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Project]]></category>

		<guid isPermaLink="false">http://bio.adking.co.uk/?p=73</guid>
		<description><![CDATA[Today: How to narrow down Not all relations have data attached to them, for example blast or sesame. GDS.getValue to get &#8220;Tanimoto&#8221; is object. Convert to double to compare to a double cutoff_value. Check tanimoto before anything starts?, i.e. if statement in start() before compounds examinied. Used 1.0 as a cutoff, cutdown the amount of]]></description>
			<content:encoded><![CDATA[<p>Today:</p>
<li>How to narrow down</li>
<li>Not all relations have data attached to them, for example blast or sesame.</li>
<li><code>GDS.getValue</code> to get &#8220;Tanimoto&#8221; is object. Convert to double to compare to a double cutoff_value.</li>
<li>Check tanimoto before anything starts?, i.e. if statement in <code>start()</code> before compounds examinied. Used 1.0 as a cutoff, cutdown the amount of motifs found by 81%.</li>
]]></content:encoded>
			<wfw:commentRss>http://bio.adking.co.uk/2010/06/08/june-8/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

