<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments for Teofil Achirei</title>
	<atom:link href="http://teofilachirei.wordpress.com/comments/feed/" rel="self" type="application/rss+xml" />
	<link>http://teofilachirei.wordpress.com</link>
	<description>Teofil Achirei's official blog</description>
	<lastBuildDate>Thu, 03 Sep 2009 12:30:43 +0000</lastBuildDate>
	<generator>http://wordpress.com/</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>Comment on Evil Links by Google Summer of Code 2009 &#171; Teofil Achirei</title>
		<link>http://teofilachirei.wordpress.com/2009/04/01/evil-links/#comment-49</link>
		<dc:creator>Google Summer of Code 2009 &#171; Teofil Achirei</dc:creator>
		<pubDate>Thu, 03 Sep 2009 12:30:43 +0000</pubDate>
		<guid isPermaLink="false">http://teofilachirei.wordpress.com/?p=191#comment-49</guid>
		<description>[...] am going to start a project about style-sheet extraction, CSS analyzing and CSS optimizing&#8221; here? Well, I was talking about the style support for XOffice. This page has more information about this [...]</description>
		<content:encoded><![CDATA[<p>[...] am going to start a project about style-sheet extraction, CSS analyzing and CSS optimizing&#8221; here? Well, I was talking about the style support for XOffice. This page has more information about this [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Evil Links by dangiankit</title>
		<link>http://teofilachirei.wordpress.com/2009/04/01/evil-links/#comment-48</link>
		<dc:creator>dangiankit</dc:creator>
		<pubDate>Tue, 14 Jul 2009 09:09:26 +0000</pubDate>
		<guid isPermaLink="false">http://teofilachirei.wordpress.com/?p=191#comment-48</guid>
		<description>You might be interested in learning about Crawling Web 2.0 Applications. I have cited two papers at &lt;a href=&quot;http://en.wikipedia.org/wiki/Web_crawler#Crawling_Web_2.0_Applications&quot; rel=&quot;nofollow&quot;&gt;Wiki Article on Web Crawlers&lt;/a&gt;. If you get more information, do drop me an email and update the same wiki article, for the rest of the world.</description>
		<content:encoded><![CDATA[<p>You might be interested in learning about Crawling Web 2.0 Applications. I have cited two papers at <a href="http://en.wikipedia.org/wiki/Web_crawler#Crawling_Web_2.0_Applications" rel="nofollow">Wiki Article on Web Crawlers</a>. If you get more information, do drop me an email and update the same wiki article, for the rest of the world.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Classification and clustering [part one] by teofilachirei</title>
		<link>http://teofilachirei.wordpress.com/2009/05/17/classification-clustering-1/#comment-43</link>
		<dc:creator>teofilachirei</dc:creator>
		<pubDate>Mon, 18 May 2009 19:23:19 +0000</pubDate>
		<guid isPermaLink="false">http://teofilachirei.wordpress.com/?p=294#comment-43</guid>
		<description>Right now, using the Thesaurus and Link classes  the crawler can only determine if a document is interesting (it&#039;s on topic) and if *some* links also look interesting and should be considering for crawling. 
The next step (and I&#039;ll post about it pretty soon) is to determine as accurate as possible the links that are worth to follow. That&#039;s the reason the thesaurus is so simple right now.
Also, the number of links extracted from a certain page will probably be limited, because I don&#039;t want to crawl only one website.
After this step is complete, I&#039;ll modify the Thesaurus class so it can be used for classifying documents (pages). Simply put: the thesaurus will have &quot;categories&quot; and each category will have a certain list of keywords. An hash map should be simply enough for this.

To resume:
- cluster or classify documents, not links
- use the thesaurus just for the Best-First algorithm
- the thesaurus will be soon improved and I&#039;ll come with some simple document classification</description>
		<content:encoded><![CDATA[<p>Right now, using the Thesaurus and Link classes  the crawler can only determine if a document is interesting (it&#8217;s on topic) and if *some* links also look interesting and should be considering for crawling.<br />
The next step (and I&#8217;ll post about it pretty soon) is to determine as accurate as possible the links that are worth to follow. That&#8217;s the reason the thesaurus is so simple right now.<br />
Also, the number of links extracted from a certain page will probably be limited, because I don&#8217;t want to crawl only one website.<br />
After this step is complete, I&#8217;ll modify the Thesaurus class so it can be used for classifying documents (pages). Simply put: the thesaurus will have &#8220;categories&#8221; and each category will have a certain list of keywords. An hash map should be simply enough for this.</p>
<p>To resume:<br />
- cluster or classify documents, not links<br />
- use the thesaurus just for the Best-First algorithm<br />
- the thesaurus will be soon improved and I&#8217;ll come with some simple document classification</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Classification and clustering [part one] by Ionut Popa</title>
		<link>http://teofilachirei.wordpress.com/2009/05/17/classification-clustering-1/#comment-42</link>
		<dc:creator>Ionut Popa</dc:creator>
		<pubDate>Mon, 18 May 2009 13:38:13 +0000</pubDate>
		<guid isPermaLink="false">http://teofilachirei.wordpress.com/?p=294#comment-42</guid>
		<description>So are you going to use the classifier class to classify both the page and the links? Or if the document matches the  thesaurus you&#039;ll extract all the links from it?</description>
		<content:encoded><![CDATA[<p>So are you going to use the classifier class to classify both the page and the links? Or if the document matches the  thesaurus you&#8217;ll extract all the links from it?</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Google Summer of Code 2009 by Ionut Popa</title>
		<link>http://teofilachirei.wordpress.com/2009/05/11/google-summer-of-code-2009/#comment-41</link>
		<dc:creator>Ionut Popa</dc:creator>
		<pubDate>Tue, 12 May 2009 16:05:13 +0000</pubDate>
		<guid isPermaLink="false">http://teofilachirei.wordpress.com/?p=276#comment-41</guid>
		<description>Felicitari si spor la treaba</description>
		<content:encoded><![CDATA[<p>Felicitari si spor la treaba</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Google Summer of Code 2009 by Amit &#124;Web Design</title>
		<link>http://teofilachirei.wordpress.com/2009/05/11/google-summer-of-code-2009/#comment-40</link>
		<dc:creator>Amit &#124;Web Design</dc:creator>
		<pubDate>Tue, 12 May 2009 09:35:57 +0000</pubDate>
		<guid isPermaLink="false">http://teofilachirei.wordpress.com/?p=276#comment-40</guid>
		<description>Hey
Good luck and enjoy
Let us know how it was :)</description>
		<content:encoded><![CDATA[<p>Hey<br />
Good luck and enjoy<br />
Let us know how it was <img src='http://s.wordpress.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Web Crawler Architectures by Does your ISP block Web Crawling? &#171; NetEqualizer News Blog</title>
		<link>http://teofilachirei.wordpress.com/2009/04/14/web-crawler-architectures/#comment-22</link>
		<dc:creator>Does your ISP block Web Crawling? &#171; NetEqualizer News Blog</dc:creator>
		<pubDate>Thu, 16 Apr 2009 14:17:16 +0000</pubDate>
		<guid isPermaLink="false">http://teofilachirei.wordpress.com/?p=248#comment-22</guid>
		<description>[...] See also a generic flow diagram of a Web Crawler. [...]</description>
		<content:encoded><![CDATA[<p>[...] See also a generic flow diagram of a Web Crawler. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Evil Links by 7kittens</title>
		<link>http://teofilachirei.wordpress.com/2009/04/01/evil-links/#comment-14</link>
		<dc:creator>7kittens</dc:creator>
		<pubDate>Wed, 08 Apr 2009 23:27:55 +0000</pubDate>
		<guid isPermaLink="false">http://teofilachirei.wordpress.com/?p=191#comment-14</guid>
		<description>Your welcome. Post looks great.</description>
		<content:encoded><![CDATA[<p>Your welcome. Post looks great.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Evil Links by teofilachirei</title>
		<link>http://teofilachirei.wordpress.com/2009/04/01/evil-links/#comment-12</link>
		<dc:creator>teofilachirei</dc:creator>
		<pubDate>Tue, 07 Apr 2009 09:50:30 +0000</pubDate>
		<guid isPermaLink="false">http://teofilachirei.wordpress.com/?p=191#comment-12</guid>
		<description>Thank you very much! I was searching for something like that for a long time. I tried &lt;strong&gt;code&lt;/strong&gt; tag but it was awful with my current theme.
It really helps!
I&#039;ll try to update my posts to replace old &lt;strong&gt;pre&lt;/strong&gt; tags with &lt;strong&gt;sourcecode&lt;/strong&gt;</description>
		<content:encoded><![CDATA[<p>Thank you very much! I was searching for something like that for a long time. I tried <strong>code</strong> tag but it was awful with my current theme.<br />
It really helps!<br />
I&#8217;ll try to update my posts to replace old <strong>pre</strong> tags with <strong>sourcecode</strong></p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on Evil Links by 7kittens</title>
		<link>http://teofilachirei.wordpress.com/2009/04/01/evil-links/#comment-11</link>
		<dc:creator>7kittens</dc:creator>
		<pubDate>Mon, 06 Apr 2009 23:48:39 +0000</pubDate>
		<guid isPermaLink="false">http://teofilachirei.wordpress.com/?p=191#comment-11</guid>
		<description>Oh by the way, there is a &lt;b&gt;sourcecode&lt;/b&gt; tag to format and color code your example code, like so:
&lt;pre class=&quot;brush: xml;&quot;&gt;
&lt;script type=&quot;text/javascript&quot;&gt;
   document.write(&quot;&lt;div style=&#039;display:none&#039;&gt;&quot;);
&lt;/script&gt;

&lt;a href=&quot;URL6&quot; rel=&quot;nofollow&quot;&gt;Hidden Link 6&lt;/a&gt;

&lt;script type=&quot;text/javascript&quot;&gt;
   document.write(&quot;&lt;/div&gt;&quot;);
&lt;/script&gt;
&lt;/pre&gt;

Hope this helps.</description>
		<content:encoded><![CDATA[<p>Oh by the way, there is a <b>sourcecode</b> tag to format and color code your example code, like so:</p>
<pre class="brush: xml;">
&lt;script type=&quot;text/javascript&quot;&gt;
   document.write(&quot;&lt;div style='display:none'&gt;&quot;);
&lt;/script&gt;

&lt;a href=&quot;URL6&quot; rel=&quot;nofollow&quot;&gt;Hidden Link 6&lt;/a&gt;

&lt;script type=&quot;text/javascript&quot;&gt;
   document.write(&quot;&lt;/div&gt;&quot;);
&lt;/script&gt;
</pre>
<p>Hope this helps.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
