衝 !



  • TiVo application: It is a TV show recommender system implemented in decentralized architecture. Users’ preference on shows was presented in the form of rating (from score -N to +N) and the preference score was rated in the client side. The TiVo server periodically aggregate the clients’ rating and send the rating table via ring mechanism to the client. Each client received the rating table utilize the collaborative filtering mechanism to predict the rating score of the shows he/she has never seen. Next, after the predicting process was completed, the show recommender at client side will make show recommendations for its user.
  • Fab Application: It is a web pages recommender system integrated both the content-based and collaborative filtering approaches. In Fab, there are three main system components, namely collection agents, selection agents and the central router. Collection agents are designed to collect web page of a specific topic and central router will aggregate the collected pages. Next, central router will forward pages which related to users’ interest profiles to system users and users’ selection agents will make  selection to the forwarded pages (e.g. discard pages have seen by users). In this phrase, content based mechanisms are implemented to make recommendation from central router to users. In next phrase, the concept of collaborative filtering mechanisms are applied. Selection agents at user side will forward the pages to the nearest neighbors according to some metrics for similarity calculations. To sum up, Fab combine both the well-known recommendation approaches to produce hybrid recommendations, which was running under the distributed architecture.

我的 Proposal 核心

Tagging 的部份 :
由於CF的運算是要採取 co-rated items 的方式去做similarity比較
如果單純以 paper 作為 item,則使用者彼此之間的co-rated的item
set 一定不會很大,甚至很可能為空集合,而無法計算similarity。
因此比起傳統用SVD來降低Rating Table的維度(維度是要用tune的),
來增進co-rated的機會,在這邊我是利用Tag的方式來增進 co-rated
Space不一定會小於Item Space,但是我的目的只是要增進使用者間

Social Recommendation 的部份 :
傳統的CF運算是針對一個使用者(Target User)的行為,來找出系統中行
為跟他相似的 ” 朋友 “,進而分析他朋友喜歡的東西,來推薦給我們的Target
我們的 Target User 的朋友(儘管Target User實際上不認識他們),所以有
些學者也把CF歸類為Social Recommendation的一種方式。而在我的Proposal
中所提及的Social Recommendation除了會利用到CF的運算,此外還會考
Recommendation List的diversity(多樣化),因為使用者自行定義的Friends很可
能在CF的運算中並不會被納入為”朋友”的考量。而我所提的Social Recommendation

Personalized Recommendation推薦的東西比較傾向於使用者原本就喜歡的東西,

Social Recommendation推薦的東西比較傾向於使用者 “可能” 會喜歡的東西,可能
並且利用mining social relationship提供多樣化(diversity)的推薦列表。

在我的work中,並沒有傳統的Rating table使用者明確給予的評分來讓我跑推薦演算法。

Popularity Measurements

How to measure the popularity of a product can be discussed in three dimensions :

1. The quality of a product

2. The frequency of a product’s being purchased

3. The strong support of a product (big fans)

Paper report 1


A. Blum and M. Furst. Fast planning through planning graph analysis. In Proceedings of the International Joint Conference of Artificial Intelligence, pages 1636-1642, August 1995.


The paper proposed a mechanism for solving planning problem using a compact structure called planning graph. The planner Graphplan used in this paradigm can always return the shortest plan or answers no plan existed for any planning problem. Moreover, it can also solve partial ordering planning problems. With the great power of having high performance and guarantee soundness and completeness, Graphplan beat many well-known planners in the world.


Planning problems can be described of using planner to finding a sequence of actions from an initial state to achieve a target goal. Since real world planning problems can be encoded into representations that computer can understand and handle, designing a high performance planner is crucial in many real applications. The mechanism the paper proposed can be described as follows. First, it will construct a data structure called planning graph which was composed of many stages expending level by level, while each stage is composed of propositional level and action level. Second, it will add constraints (mutual exclusion) on each level, including inconsistent effects, interference, competing needs and inconsistent support. It means that if a solution returned by Graphplan, then all actions can not obey the constraints defined in this scheme. Third, it will expand the graph level by level until our goal appears, and it will use backtracking to see whether or not a valid plan exists. Finally, if no valid plan exists in this stage, it will continue to expand the graph until leveled-off. If the graph is leveled-off and some literals of the goal do not appear or are marked as mutex in the latest proposition level, the problem is unsolvable. The proposed planner only required polynomial time and space to run its algorithm and is sound and complete. From the view of experimental results, it shows that Graphplan outperforms many total-order and partial-order planners in many well-known AI planning problems. The question I was pondering about is that why literals and actions will increase monotonically and mutexes will decrease monotonically in the planning graph, and eventually level off. Moreover, if we add time factor into the planning problem, I don’t know whether or not the planner can guarantee that running this algorithm is sound and complete.

Research Issues

Netflix data set : 有好幾十億的資料,若要轉成 user-item matrix

做分析,則會因為 scale 太大而難以處理。

Solution : Apply SVD ( Singular Vector Decomposition) to the matrix.

SVD 可以把原始的 matrix 降至更低維度,並且保持一些性質 (待check)。

想法 : 幫每個movie定義一個 feature vactor,vector中的每個tuple代表


preference vector,vector中的每個tuple代表對movie中的每個性質之喜


So let user A’s preference vector is = ( 1 , 2 , -1 )

let Movie M’s feature vector is = ( 1 , 4 , -1 )

then user A’s rating on Movie M is 1*1 + 2*4 + -1*-1 = 1 + 8 + 1 = 10

Paper Recommendation :

讓使用者把自己電腦中的 paper 交給 bibagent去取得 bibtex 後,並紀

錄每篇 paper在使用者電腦中的save time。如此就可以apply time weighted

技術來 identify目前user的working set是哪些領域的paper。為了不讓user


number來當做客觀的rating,or we can do it in this way :

rating of a paper = citation number / published date

So , we can designe a web for user to upload the result of the retriving data made

by bibagent , then also let user to create the FOAF-like data sheet so that we can

use those infomation to apply CF-Algo .

Another point : 針對某篇 paper,我們是否能找出對此篇paper有興趣的user set,

so that we can get recommendations more specific to the paper from the user set .


Time issue in designing a RS

之前看了學長的 Time weighted CF 演算法成功了改進傳統


點是,一個人早期的 rating 資料的參考價值,不會比現在的

rating 資料的參考價值來的高。因此他設計了一個參考價值

依照時間飄逝 decrese 的曲線,也就是距離目前時間點愈近


使用者所做的 rating 參考價值是 1 ,則昨天此位 user 所做的

rating參考價值可能就是 0.9,如此把參考價值乘上實際的rating


改了一下 CF 演算法的核心部份,準確度就上升了。但是目前

在 RS 系統上改善 Accuracy 的主要盲點是在於大家都會想出一



新方法就可以真的達到目的。 As known as if p then q ( p -> q) ,

we can not guarantee that q -> p 。其實這和 overfitting 有很大

的關係,儘管我們用很複雜的 model 來讓這份萬年 training data

( MovieLens ) 得到最好的 training 效果,並不意味我們拿到一份

全然不同的 training data時也會有同樣的好效果。其實真正要能

評論 Accuracy 的只有使用者才算是最客觀的標準。 但是人又是