# Monthly Archives: January 2007

# Information Retrieval Tools

http://kaukoai.iis.sinica.edu.tw/software.html

http://www.webir.org/

http://www.aaai.org/AITopics/html/info.html

老闆給我3個IR-releated resources

希望能在 Bib Agent 的 implementation 上有幫助

看來在下學期的 DataMining & IR 課堂上也很有可能用的到 ^^

# All I learn from the first semester

**Neural Network :**

Idea : Use neurons as the hyperplane to separate trainning example by class.

How to : Using Back-Propagation Algorithm to train the neural network .

**Computational Complexity :**

1. The computational model of the turing machine.

All problems that can be solved by existing machines can be also solved by turing machines.

2. The concept of the P , NP , NPC , NP-hard

P : Collect the problems that can be solved in polynomial time by TM.

NP: Collect the problems that can be solved in polynomial time by NTM (recall decision tree , guess and verify !)

NPC: Collect the problems in NP that every problems in NP can be reduced to them.

NP-hard: Collect the problems that every problems in NP can be reduced to them , and they maybe are not in NP.

**Artificial Intelligence ****:**

1.How to design an intelligent agent ? Consider the performance of the agent to solve problems .

2.Local Search method can be a good search method to solve NP problems .

**Multi-Agent Meeting**** :**

1. Recommender System that was implemented by using collaborative filtering algorithm.

2. Collaborative filtering algorithm :

Idea : Transform the rating data in to the rating matrix , and use the matrix as the computation basis to find each one’s most similar users and using those users’ rating data as the prediction of our target users .

CF can be classified into 3 main class :

1. User-based : row vectors

2. Item-based : column vectors

3. Unifying User-based and Item-based method : row vectors & column vectors (Can handle sparsity issues)

Comments : Using cluster algorithm can handle scalability issues.

# Bib Agent

Idea : Generate metadata of papers automatically .

Step 1 : Use PDF Library to transform pdf files into plain text files.

Step 2: Design an agent to extract the key words of the papers such as paper title & the top-10 words of the papers’ abstract.

Step 3: Use the key words as the query to the CiteSeer and parse the returning contents to extract the BibTex section of the papers.

Step4: Use the BibTex as the metadata of the papers .

Further more …

Use FOAF files to create a social network and use it to Share or Query papers .

# 在NTU CSIE的第一學期

心很靜 …

但是 …

頭腦卻是持續地思考著 …. : )

就這樣淡淡的過了這一學期 …

# Hierarchical Clustering

Ref : http://www.elet.polimi.it/upload/matteucc/Clustering/tutorial_html/hierarchical.html

single-linkage : 取 cluster 和 cluster 間最短距離當 distance

complete-linkage : 取 cluster 和 cluster 間最長距離當 distance

average-linkage : 取 cluster 和 cluster 間平均距離當 distance