<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Semantic Motif Hunting &#187; Project</title>
	<atom:link href="http://bio.adking.co.uk/category/project/feed/" rel="self" type="application/rss+xml" />
	<link>http://bio.adking.co.uk</link>
	<description>Research Blog</description>
	<lastBuildDate>Wed, 06 Oct 2010 22:04:10 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>The end</title>
		<link>http://bio.adking.co.uk/2010/08/26/the-end/</link>
		<comments>http://bio.adking.co.uk/2010/08/26/the-end/#comments</comments>
		<pubDate>Thu, 26 Aug 2010 17:09:30 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[Paper]]></category>
		<category><![CDATA[Project]]></category>
		<category><![CDATA[Finshed]]></category>

		<guid isPermaLink="false">http://bio.adking.co.uk/?p=320</guid>
		<description><![CDATA[The paper is completed and submitted!]]></description>
			<content:encoded><![CDATA[<p>The paper is completed and submitted!</p>
]]></content:encoded>
			<wfw:commentRss>http://bio.adking.co.uk/2010/08/26/the-end/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>References</title>
		<link>http://bio.adking.co.uk/2010/08/18/references/</link>
		<comments>http://bio.adking.co.uk/2010/08/18/references/#comments</comments>
		<pubDate>Wed, 18 Aug 2010 19:45:56 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Paper]]></category>
		<category><![CDATA[Project]]></category>
		<category><![CDATA[Citations]]></category>
		<category><![CDATA[References]]></category>

		<guid isPermaLink="false">http://bio.adking.co.uk/?p=288</guid>
		<description><![CDATA[References are now all accounted for and are listed on the references page.]]></description>
			<content:encoded><![CDATA[<p>References are now all accounted for and are listed on the <a href="http://bio.adking.co.uk/references/">references page</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://bio.adking.co.uk/2010/08/18/references/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>FANMOD with 1000 random</title>
		<link>http://bio.adking.co.uk/2010/08/02/fanmod-with-1000-random/</link>
		<comments>http://bio.adking.co.uk/2010/08/02/fanmod-with-1000-random/#comments</comments>
		<pubDate>Mon, 02 Aug 2010 09:40:20 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Bioinformatics]]></category>
		<category><![CDATA[Project]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[FANMOD]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[Stats]]></category>

		<guid isPermaLink="false">http://bio.adking.co.uk/?p=270</guid>
		<description><![CDATA[Result Overview: Network name: D:\drugdatasetinput.txt Network type: Directed Number of nodes: 121320 Number of edges: 434800 Number of single edges: 307859 Number of mutual edges: 126941 Algorithm: enumeration Subgraph size: 3 Generated 1000 random networks with locally constant number of bidirectional edges, 3 exchanges per edge and 3 tries per edge. 172022813 subgraphs were enumerated]]></description>
			<content:encoded><![CDATA[<p><strong>Result Overview:</strong></p>
<p><code>Network name: D:\drugdatasetinput.txt<br />
Network type: Directed<br />
Number of nodes: 121320<br />
Number of edges: 434800<br />
Number of single edges: 307859<br />
Number of mutual edges: 126941</code></p>
<p>Algorithm: enumeration<br />
Subgraph size: 3</p>
<p>Generated 1000 random networks<br />
with locally constant number of bidirectional edges,<br />
3 exchanges per edge and 3 tries per edge.</p>
<p>172022813 subgraphs were enumerated in the original network.<br />
197199994016 subgraphs were enumerated in the random networks.<br />
197372016829 subgraphs were enumerated in all networks.</p>
<p>For the random networks: 1578398185 tries were made, 1274251639 were successful.<br />
Randomization took 30053.2 seconds.<br />
Enumeration took 697127 seconds.</p>
<p><strong>Output Files:</strong></p>
<ul>
<li><a href="http://bio.adking.co.uk/wp-content/uploads/2010/08/drugdatasetinput_MOAR.txt">Output</a></li>
<li><a href="http://bio.adking.co.uk/wp-content/uploads/2010/08/drugdatasetinput_MOAR.html">HTML Overview</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://bio.adking.co.uk/2010/08/02/fanmod-with-1000-random/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>FANMOD et al.</title>
		<link>http://bio.adking.co.uk/2010/07/20/fanmod-et-al/</link>
		<comments>http://bio.adking.co.uk/2010/07/20/fanmod-et-al/#comments</comments>
		<pubDate>Tue, 20 Jul 2010 13:43:20 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Project]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Stats]]></category>

		<guid isPermaLink="false">http://bio.adking.co.uk/?p=245</guid>
		<description><![CDATA[FANMOD Size 3 Subgraph (100,000 Samples, 100 Random Network) Approximate number of subgraphs: 265326840 (based on 100000 samples) 172022813 subgraphs were enumerated in the original network. Result File Size 6 Subgraph (100,000 Samples, 100 Random Network) Approximate number of subgraphs: 9365924517517080 (based on 100000 samples) MFINDER1.2 3 (Default Settings) - 6 (Default Settings) -]]></description>
			<content:encoded><![CDATA[<p><strong>FANMOD</strong></p>
<p>Size 3 Subgraph (100,000 Samples, 100 Random Network)</p>
<ul>
<li>Approximate number of subgraphs: 265326840 (based on 100000 samples)</li>
<li>172022813 subgraphs were enumerated in the original network.</li>
<li><a href="http://bio.adking.co.uk/wp-content/uploads/2010/07/drugdatasetinput_3.html">Result File</a></li>
</ul>
<p>Size 6 Subgraph (100,000 Samples, 100 Random Network)</p>
<ul>
<li>Approximate number of subgraphs: 9365924517517080 (based on 100000 samples)</li>
</ul>
<p><strong>MFINDER1.2</strong></p>
<p>3 (Default Settings)</p>
<ul>
<li>-</li>
</ul>
<p>6 (Default Settings)</p>
<ul>
<li>-</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://bio.adking.co.uk/2010/07/20/fanmod-et-al/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Stats</title>
		<link>http://bio.adking.co.uk/2010/07/13/stats-2/</link>
		<comments>http://bio.adking.co.uk/2010/07/13/stats-2/#comments</comments>
		<pubDate>Tue, 13 Jul 2010 12:40:11 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Paper]]></category>
		<category><![CDATA[Project]]></category>
		<category><![CDATA[Stats]]></category>

		<guid isPermaLink="false">http://bio.adking.co.uk/?p=224</guid>
		<description><![CDATA[Motif A (Chlorpromazine): There are 86 Edges and 76 Nodes in total in the neighbourhood for this motif. (162) The similarity between the compound Chlorpromazine and Trimeprazine has a 0.85 Tanimoto co-efficient. There are 60 other compounds that bind to the H1 Histamine Receptor. Chlorpomazine has 3 other similar drugs. (Tanimoto is 0.85 for all)]]></description>
			<content:encoded><![CDATA[<p>Motif A (Chlorpromazine):</p>
<ul>
<li>There are <strong>86 Edges</strong> and <strong>76 Nodes</strong> in total in the neighbourhood for this motif. (162)</li>
<li>The similarity between the compound Chlorpromazine and Trimeprazine has a <strong>0.85 Tanimoto</strong> co-efficient.</li>
<li>There are <strong>60 other</strong> compounds that bind to the H1 Histamine Receptor.</li>
<li>Chlorpomazine has <strong>3 other similar drugs</strong>. (Tanimoto is 0.85 for all)</li>
<li>Trimeprazine is similar to <strong>4 additional compounds</strong>. Tanimoto betweeen <strong>0.87 &#8211; 1.0</strong>.</li>
<li>The H1 Histamine Receptor has <strong>1 additional target</strong> and is associated with<strong> 2 proteins</strong> one of which has a similar structure.</li>
</ul>
<p>Motif A (<a title="Lenalidomide" href="http://en.wikipedia.org/wiki/Lenalidomide" target="_blank">Lenalidomide</a>)</p>
<ul>
<li>Total of <strong>21 Nodes</strong> and<strong> 25 Edges</strong> in this motif. (46)</li>
<li>Lenalidomide has <strong>1</strong> additional target.</li>
<li>Talidomide has <strong>2</strong> additional targets.</li>
<li><strong>0.85</strong> is the tanimoto coefficient in both directions if Lenalidomide and Talidomide.</li>
<li>TNF has <strong>14</strong> other drugs that bind to it and 1 Protein.</li>
</ul>
<p>Motif C (Isradipine)</p>
<ul>
<li>Blast score between CACH2 and SCN4A is <strong>2.0E-63</strong>.</li>
<li>Blast score in the reverse direction is <strong>1.0E-59</strong>.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://bio.adking.co.uk/2010/07/13/stats-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Improvements</title>
		<link>http://bio.adking.co.uk/2010/07/01/improvements/</link>
		<comments>http://bio.adking.co.uk/2010/07/01/improvements/#comments</comments>
		<pubDate>Thu, 01 Jul 2010 11:35:22 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Project]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[improvements]]></category>
		<category><![CDATA[Motifs]]></category>

		<guid isPermaLink="false">http://bio.adking.co.uk/?p=185</guid>
		<description><![CDATA[TypeA Removed all motifs where both similar drugs bind to the target. 30,231 motifs were originally identified, now 27,499 (-2,732). TypeB Removed motifs where the drug binds to both targets. 22,864 motifs were orignally identified, now 19,185 (-3,679). TypeC Removed motifs where protein(1) inv_in disease(2) or protein(2) inv_in disease(1). 70,866 motifs were orignally identified in]]></description>
			<content:encoded><![CDATA[<p><strong>TypeA</strong></p>
<ul>
<li>Removed all motifs where both similar drugs bind to the target.</li>
<li>30,231 motifs were originally identified, now <strong>27,499</strong> (-2,732).</li>
</ul>
<p><strong>TypeB</strong></p>
<ul>
<li>Removed motifs where the drug binds to both targets.</li>
<li>22,864 motifs were orignally identified, now <strong>19,185</strong> (-3,679).</li>
</ul>
<p><strong>TypeC</strong></p>
<ul>
<li>Removed motifs where protein(1) inv_in disease(2) or protein(2) inv_in disease(1).</li>
<li>70,866 motifs were orignally identified in the dataset, now <strong>70,715</strong> (-151).</li>
<li>Tried: Remove motifs where the target connects to protein(2), this did not take into account that some links are has_similar_sequence/structure and not is_a. <strong>200</strong> (-70,515).</li>
<li>Removed motifs with target is_a protein2, made no difference to total.</li>
</ul>
<p><strong>Ondex</strong></p>
<ul>
<li>Selecting a node with a certain id in Ondex?</li>
<li>Too hard when there is 500k nodes</li>
<li>Search doesn&#8217;t work</li>
<li>Ondex Console: how to use this??? -&gt; help is unavailable surprisingly.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://bio.adking.co.uk/2010/07/01/improvements/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mare</title>
		<link>http://bio.adking.co.uk/2010/06/28/146/</link>
		<comments>http://bio.adking.co.uk/2010/06/28/146/#comments</comments>
		<pubDate>Mon, 28 Jun 2010 09:58:02 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Paper]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Project]]></category>
		<category><![CDATA[plan]]></category>

		<guid isPermaLink="false">http://bio.adking.co.uk/?p=146</guid>
		<description><![CDATA[Motif Finder Exported the datasets ready for using with software (motif finder) that searches for motifs that are overrepresented. (Hypergeometric tests: 1day work). Cytoscape has a plugin. ib_2010_data.1.xml.gz &#8211;&#62; .dot (50mb) ib_2010_data.1.xml.gz &#8211;&#62; graph htm??? (cant remeber but it was 450mb) Motif Type C Trying to define another motif, struggling&#8230; similar targets? Plan Results/Code/Other program]]></description>
			<content:encoded><![CDATA[<p><strong>Motif Finder</strong></p>
<p>Exported the datasets ready for using with software (motif finder) that searches for motifs that are overrepresented. (Hypergeometric tests: 1day work). Cytoscape has a plugin.</p>
<ul>
<li>ib_2010_data.1.xml.gz &#8211;&gt; .dot (50mb)</li>
<li>ib_2010_data.1.xml.gz &#8211;&gt; graph htm??? (cant remeber but it was 450mb)</li>
</ul>
<p><strong>Motif Type C</strong></p>
<p>Trying to define another motif, struggling&#8230; similar targets?</p>
<div id="attachment_170" class="wp-caption aligncenter" style="width: 211px"><a href="http://bio.adking.co.uk/wp-content/uploads/2010/06/MotifB.png"><img src="http://bio.adking.co.uk/wp-content/uploads/2010/06/MotifB.png" alt="B" title="MotifB" width="201" height="178" class="size-full wp-image-170" /></a><p class="wp-caption-text">B</p></div>
<p><strong>Plan</strong></p>
<p><strong>Results/Code/Other program (June-July 2010)<br />
</strong></p>
<p>4 Weeks</p>
<ul>
<li>Week 1 &#8211; Code/Motif Finder (i.e. Motif A,B,C&#8230; x,y,z instances)</li>
<li>Week 2 &#8211; Java/Score/FoundExamples</li>
<li>Week 3 &#8211; Score</li>
<li>Week 4 &#8211; Score</li>
</ul>
<ul></ul>
<p><strong>Paper (July-August 2010)</strong></p>
<ul>
<li>Intro</li>
<li>Background</li>
<li>Methods</li>
<p>- Definitions of S.M.<br />
- Searching<br />
- Scoring</p>
<li>Results
<ul>
<li>TypeA</li>
<li>TypeB</li>
<li>TypeC</li>
</ul>
</li>
<li>Discussion
<ul>
<li>Future work</li>
</ul>
</li>
<li>Abstract</li>
</ul>
<p><a title="Webpage" href="https://internal.cs.ncl.ac.uk/modules/2009-10/csc8399/project_deliverables.html" target="_blank">Module Webpage</a></p>
]]></content:encoded>
			<wfw:commentRss>http://bio.adking.co.uk/2010/06/28/146/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Motif Definition 2</title>
		<link>http://bio.adking.co.uk/2010/06/17/motif-definition-2/</link>
		<comments>http://bio.adking.co.uk/2010/06/17/motif-definition-2/#comments</comments>
		<pubDate>Thu, 17 Jun 2010 14:08:58 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Ondex]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[Project]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[motif]]></category>
		<category><![CDATA[protein]]></category>

		<guid isPermaLink="false">http://bio.adking.co.uk/?p=125</guid>
		<description><![CDATA[The second motif that was defined has been completed in Java. Example output from Java (cutoff dataset): Code was ran on the updated dataset: There were 70,866 instances of this motif definition in the dataset. However, this doesn&#8217;t address the issue of the OMIM database. So many of these will not have true &#8220;Diseases&#8221;. This]]></description>
			<content:encoded><![CDATA[<p>The second motif that was defined has been completed in Java.</p>
<p><a href="http://bio.adking.co.uk/wp-content/uploads/2010/06/p_p_similar.png"><img class="aligncenter size-full wp-image-82" title="p_p_similar" src="http://bio.adking.co.uk/wp-content/uploads/2010/06/p_p_similar.png" alt="metagraph" width="422" height="218" /></a>Example output from Java (cutoff dataset):</p>
<p><a href="http://bio.adking.co.uk/wp-content/uploads/2010/06/newAbstractBig.jpg"><img class="aligncenter size-full wp-image-126" title="proteintoprotein" src="http://bio.adking.co.uk/wp-content/uploads/2010/06/newAbstractBig.jpg" alt="Protein-Protein-Simularity" width="259" height="262" /></a></p>
<p>Code was ran on the updated dataset:</p>
<ul>
<li>There were <strong>70,866</strong> instances of this motif definition in the dataset.</li>
<li>However, this doesn&#8217;t address the issue of the OMIM database. So many of these will not have true &#8220;Diseases&#8221;. This would probably be best addressed when trying to score this type, if its even possible to do this.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://bio.adking.co.uk/2010/06/17/motif-definition-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Meeting</title>
		<link>http://bio.adking.co.uk/2010/06/16/meeting/</link>
		<comments>http://bio.adking.co.uk/2010/06/16/meeting/#comments</comments>
		<pubDate>Wed, 16 Jun 2010 10:32:06 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Project]]></category>
		<category><![CDATA[dataset]]></category>
		<category><![CDATA[meeting]]></category>

		<guid isPermaLink="false">http://bio.adking.co.uk/?p=120</guid>
		<description><![CDATA[16/06/2010 Project Meeting The issue with &#8216;Comp:Compound&#8216; is being resolved compounds will be known as &#8216;Compound&#8216; after the fix. Disease nodes &#8211; Some are diseases and some are not diseases, they are other things such as Genes due to OMIM. Find more examples like Chlorprazine and more complex ones. Update: Dataset has been updated 30,231]]></description>
			<content:encoded><![CDATA[<p><strong>16/06/2010</strong> <strong>Project Meeting</strong></p>
<ul>
<li>The issue with &#8216;<code>Comp:Compound</code>&#8216; is being resolved compounds will be known as &#8216;<code>Compound</code>&#8216; after the fix.</li>
<li>Disease nodes &#8211; Some are diseases and some are not diseases, they are other things such as Genes due to <abbr title="Online Mendelian Inheritance in Man">OMIM</abbr>.</li>
<li>Find more examples like Chlorprazine and more complex ones.</li>
</ul>
<p>Update: Dataset has been updated</p>
<ul>
<li>30,231 Motifs (Basic Type) is now the latest count.</li>
<li>Still can&#8217;t get to print Concept names out (Doing it wrong?)</li>
</ul>
<p><strong>New dataset info:</strong></p>
<p><a href="http://bio.adking.co.uk/wp-content/uploads/2010/06/newdata.jpg"><img class="aligncenter size-full wp-image-128" title="newdata" src="http://bio.adking.co.uk/wp-content/uploads/2010/06/newdata.jpg" alt="" width="315" height="635" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://bio.adking.co.uk/2010/06/16/meeting/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cross Checking</title>
		<link>http://bio.adking.co.uk/2010/06/15/cross-checking/</link>
		<comments>http://bio.adking.co.uk/2010/06/15/cross-checking/#comments</comments>
		<pubDate>Tue, 15 Jun 2010 21:35:49 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Project]]></category>

		<guid isPermaLink="false">http://bio.adking.co.uk/?p=93</guid>
		<description><![CDATA[Linux(Ubuntu) was successfully installed on to the machine along with the latest version of Ondex. The large dataset seemed to work fine in Linux with 3.6gig of allowed memory for the program. I searched for Chlorpromazine with in the dataset to get node id&#8217;s of the semantic motif discussed in the paper. Drug 1 &#8211;]]></description>
			<content:encoded><![CDATA[<p>Linux(Ubuntu) was successfully installed on to the machine along with the latest version of Ondex. The large dataset seemed to work fine in Linux with 3.6gig of allowed memory for the program. I searched for Chlorpromazine with in the dataset to get node id&#8217;s of the semantic motif discussed in the paper.</p>
<ul>
<li>Drug 1 &#8211; Chlorpromazine &#8211; ID:120943</li>
<li>Drug 2 &#8211; Trimeprazine &#8211; ID:120836</li>
<li>Target &#8211; Histamine H1 Receptor &#8211; ID:715</li>
</ul>
<p>I then traversed the output of the java code in Ondex Mini, and sucessfully found the motif:</p>
<p><a href="http://bio.adking.co.uk/wp-content/uploads/2010/06/chlor.jpg"><img class="aligncenter size-medium wp-image-106" title="Chlorpromazine Motif" src="http://bio.adking.co.uk/wp-content/uploads/2010/06/chlor-300x168.jpg" alt="Chlorpromazine Motif" width="300" height="168" /></a></p>
<p><strong>Definition of a more complex example</strong></p>
<ul>
<li>Started an implementation of a more complex motif i.e. look for similar proteins.</li>
<li>Developed it using the chart created on a previous post.</li>
<li>Seems to be working okay, although there are some issues with the Disease output and when to increment the motif counter/add to the motif Set.</li>
</ul>
<p><a href="http://bio.adking.co.uk/wp-content/uploads/2010/06/bigger.jpg"><img class="aligncenter size-full wp-image-117" title="ExampleOutput" src="http://bio.adking.co.uk/wp-content/uploads/2010/06/bigger.jpg" alt="" width="540" height="243" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://bio.adking.co.uk/2010/06/15/cross-checking/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

