• TiVo application: It is a TV show recommender system implemented in decentralized architecture. Users’ preference on shows was presented in the form of rating (from score -N to +N) and the preference score was rated in the client side. The TiVo server periodically aggregate the clients’ rating and send the rating table via ring mechanism to the client. Each client received the rating table utilize the collaborative filtering mechanism to predict the rating score of the shows he/she has never seen. Next, after the predicting process was completed, the show recommender at client side will make show recommendations for its user.
  • Fab Application: It is a web pages recommender system integrated both the content-based and collaborative filtering approaches. In Fab, there are three main system components, namely collection agents, selection agents and the central router. Collection agents are designed to collect web page of a specific topic and central router will aggregate the collected pages. Next, central router will forward pages which related to users’ interest profiles to system users and users’ selection agents will make  selection to the forwarded pages (e.g. discard pages have seen by users). In this phrase, content based mechanisms are implemented to make recommendation from central router to users. In next phrase, the concept of collaborative filtering mechanisms are applied. Selection agents at user side will forward the pages to the nearest neighbors according to some metrics for similarity calculations. To sum up, Fab combine both the well-known recommendation approaches to produce hybrid recommendations, which was running under the distributed architecture.

我的 Proposal 核心

Tagging 的部份 :
由於CF的運算是要採取 co-rated items 的方式去做similarity比較
如果單純以 paper 作為 item,則使用者彼此之間的co-rated的item
set 一定不會很大,甚至很可能為空集合,而無法計算similarity。
因此比起傳統用SVD來降低Rating Table的維度(維度是要用tune的),
來增進co-rated的機會,在這邊我是利用Tag的方式來增進 co-rated
Space不一定會小於Item Space,但是我的目的只是要增進使用者間

Social Recommendation 的部份 :
傳統的CF運算是針對一個使用者(Target User)的行為,來找出系統中行
為跟他相似的 ” 朋友 “,進而分析他朋友喜歡的東西,來推薦給我們的Target
我們的 Target User 的朋友(儘管Target User實際上不認識他們),所以有
些學者也把CF歸類為Social Recommendation的一種方式。而在我的Proposal
中所提及的Social Recommendation除了會利用到CF的運算,此外還會考
Recommendation List的diversity(多樣化),因為使用者自行定義的Friends很可
能在CF的運算中並不會被納入為”朋友”的考量。而我所提的Social Recommendation

Personalized Recommendation推薦的東西比較傾向於使用者原本就喜歡的東西,

Social Recommendation推薦的東西比較傾向於使用者 “可能” 會喜歡的東西,可能
並且利用mining social relationship提供多樣化(diversity)的推薦列表。

在我的work中,並沒有傳統的Rating table使用者明確給予的評分來讓我跑推薦演算法。

Popularity Measurements

How to measure the popularity of a product can be discussed in three dimensions :

1. The quality of a product

2. The frequency of a product’s being purchased

3. The strong support of a product (big fans)

Paper report 1


A. Blum and M. Furst. Fast planning through planning graph analysis. In Proceedings of the International Joint Conference of Artificial Intelligence, pages 1636-1642, August 1995.


The paper proposed a mechanism for solving planning problem using a compact structure called planning graph. The planner Graphplan used in this paradigm can always return the shortest plan or answers no plan existed for any planning problem. Moreover, it can also solve partial ordering planning problems. With the great power of having high performance and guarantee soundness and completeness, Graphplan beat many well-known planners in the world.


Planning problems can be described of using planner to finding a sequence of actions from an initial state to achieve a target goal. Since real world planning problems can be encoded into representations that computer can understand and handle, designing a high performance planner is crucial in many real applications. The mechanism the paper proposed can be described as follows. First, it will construct a data structure called planning graph which was composed of many stages expending level by level, while each stage is composed of propositional level and action level. Second, it will add constraints (mutual exclusion) on each level, including inconsistent effects, interference, competing needs and inconsistent support. It means that if a solution returned by Graphplan, then all actions can not obey the constraints defined in this scheme. Third, it will expand the graph level by level until our goal appears, and it will use backtracking to see whether or not a valid plan exists. Finally, if no valid plan exists in this stage, it will continue to expand the graph until leveled-off. If the graph is leveled-off and some literals of the goal do not appear or are marked as mutex in the latest proposition level, the problem is unsolvable. The proposed planner only required polynomial time and space to run its algorithm and is sound and complete. From the view of experimental results, it shows that Graphplan outperforms many total-order and partial-order planners in many well-known AI planning problems. The question I was pondering about is that why literals and actions will increase monotonically and mutexes will decrease monotonically in the planning graph, and eventually level off. Moreover, if we add time factor into the planning problem, I don’t know whether or not the planner can guarantee that running this algorithm is sound and complete.

Research Issues

Netflix data set : 有好幾十億的資料,若要轉成 user-item matrix

做分析,則會因為 scale 太大而難以處理。

Solution : Apply SVD ( Singular Vector Decomposition) to the matrix.

SVD 可以把原始的 matrix 降至更低維度,並且保持一些性質 (待check)。

想法 : 幫每個movie定義一個 feature vactor,vector中的每個tuple代表


preference vector,vector中的每個tuple代表對movie中的每個性質之喜


So let user A’s preference vector is = ( 1 , 2 , -1 )

let Movie M’s feature vector is = ( 1 , 4 , -1 )

then user A’s rating on Movie M is 1*1 + 2*4 + -1*-1 = 1 + 8 + 1 = 10

Paper Recommendation :

讓使用者把自己電腦中的 paper 交給 bibagent去取得 bibtex 後,並紀

錄每篇 paper在使用者電腦中的save time。如此就可以apply time weighted

技術來 identify目前user的working set是哪些領域的paper。為了不讓user


number來當做客觀的rating,or we can do it in this way :

rating of a paper = citation number / published date

So , we can designe a web for user to upload the result of the retriving data made

by bibagent , then also let user to create the FOAF-like data sheet so that we can

use those infomation to apply CF-Algo .

Another point : 針對某篇 paper,我們是否能找出對此篇paper有興趣的user set,

so that we can get recommendations more specific to the paper from the user set .


Time issue in designing a RS

之前看了學長的 Time weighted CF 演算法成功了改進傳統


點是,一個人早期的 rating 資料的參考價值,不會比現在的

rating 資料的參考價值來的高。因此他設計了一個參考價值

依照時間飄逝 decrese 的曲線,也就是距離目前時間點愈近


使用者所做的 rating 參考價值是 1 ,則昨天此位 user 所做的

rating參考價值可能就是 0.9,如此把參考價值乘上實際的rating


改了一下 CF 演算法的核心部份,準確度就上升了。但是目前

在 RS 系統上改善 Accuracy 的主要盲點是在於大家都會想出一



新方法就可以真的達到目的。 As known as if p then q ( p -> q) ,

we can not guarantee that q -> p 。其實這和 overfitting 有很大

的關係,儘管我們用很複雜的 model 來讓這份萬年 training data

( MovieLens ) 得到最好的 training 效果,並不意味我們拿到一份

全然不同的 training data時也會有同樣的好效果。其實真正要能

評論 Accuracy 的只有使用者才算是最客觀的標準。 但是人又是



Mail 相關術語,分析 Mail 所建立的 social networks

TO : 郵件的主要接收人

CC : 全名是 Carbon Copy ,副本。

BCC : 全名是 Blind Carbon Copy,密件副本。

在分析 Mail 所建立的 social networks 時,郵件彼此傳送之間

傳送者與接收者的重要性可以由 TO , CC , BCC 看出。 比如一

封 MAIL 如果 TO 很多人,則這封郵件對於每個人的重要性就

沒有來比TO 單一個人來的高。當然,BCC給某個人,則代表



在分析 Social network 所做的 Visualization 目的是為了讓使用

者能更直觀的看出人與人之間的關係,比如兩個 Nodes之間的遠


但是當 Network 的 Scale 很大時,做Visualization可能就會遭遇呈現

的問題吧,想像著要把好幾千人所組成的 Network 秀在一個小螢幕上

又可以同時讓使用者了解每個 node 是代表哪位人物的確不是很容易


Recommender Systems – Trust

How to make user trust our recommender system is a crucial

design goal . Image that if a RS always recommend items that

a user have never seen , so the items will be strange to the user.

In other words , the RS will not be so friendly to the user . So

how to make RS seems that it can understand the user’s psychology

is a very important point. It’s just like that RS play a friend role of the

user . In fact , it has simple way to do this : try to recommend items

that the user have ever seen , then let user feel friendly toward our

systems just like his/her friend . So , accuracy is not enough in designing

a RS , we must consider other factors like HCI perspectives in order to

let users easily control our RS.

PR Notes

Pattern recognition — the act of taking in raw data and taking an action based on the “category” of the pattern .

Segmentation operation in which the images of different fish are somehow isolated from one another and from the background. 比如把某隻魚從其他的魚中獨立出來並且去除背景,此動作叫做Segmentation。

Feature 的好壞 : 當一個 feature f1 和 feature f2 有相關性 ( 比如是正相關 ) .
則同時選擇 f1 和 f2 當 feature並沒有什麼意義,對 classification 的效果也不會提升。

用太複雜的 model 來做 decision 並不一定是好的,很難有 generalization 的效果。

Given that there truly are differences between the population of sea bass and that
of salmon, we view them as having different models — different descriptions, which
are typically mathematical in form. 盧魚和鮭魚顯然有著不同的特質,比如外觀,
長度,顏色等。諸如此類的特性,我們稱之為 Model。我們常常會用數學式來描述
Model,比如說長度 > 50 以上就是盧魚,<=50就是鮭魚。此為簡單的Model,
因此為了要區分盧魚和鮭魚所做出的decision boundary就很簡單。

Syntactic pattern recognition, where rules or grammars describe our decision. For ex-
ample we might wish to classify an English sentence as grammatical or not, and here
statistical descriptions (word frequencies, word correlations, etc.) are inapapropriate.
Syntactic的方式比較類似定 rule 來描述我們的 decision。

Pattern 表示法可以採用 vector 表示。