最近作業真的爆多

作業爆多 光是寫個 paper report 就寫到快暈倒了

本來想去玩玩 netflix dataset

但又覺得心有餘而力不足

研究之路真的不好走 總覺得有點格式化 …

得要找找動力來繼續努力啦 ~

Advertisements

Paper report 3

Paper

C. Matuszek, M. Witbrock, R. Kahlert, J. Cabral, D. Schneider, P. Shah and D.B. Lenat. Searching for Common Sense: Populating Cyc from the Web. In Proceedings of the Twentieth National Conference on Artificial Intelligence, Pittsburgh, Pennsylvania, July 2005.

Report

The paper proposed a mechanism for automating the process of gathering common sense for Cyc knowledge base from the world wide webs. In contrast to traditional method which Cyc KB gather knowledge from manually entering common sense by experts, the scheme the paper proposed is more efficient than the traditional one.

 

In the last twenty years, over 3 million facts and rules have been entered manually in the Cyc knowledge base by ontologists. It’s obvious that follow this traditional method to gather knowledge will take lots of efforts from human experts and not to be so efficient. Since world wild web had become more and more popular and were full of large amount of the human knowledge, so designing a mechanism to automatically retrieve common sense knowledge from WWW seems to be so feasible for the Cyc KB. The automating process of gathering knowledge for Cyc KB can be described as follows. First, we will choosing queries that can not be answered by Cyc KB and parse them into search string by using predefined templates. Second, pass all the queries strings for GOOGLE search and later we will get lots of search results. Third, we parse the search results to get the GAFs, then we will using inference to check the consistency between the old facts and the new facts (new GAFs) and discard the inconsistent ones. Later, we re-parse the consistent facts into search string for GOOGLE to check whether or not it can return search results, and we will discard the GAFs that return no search result. Finally , we pass the GOOGLE verified GAFs to human expert for reviewing for the correctness , and the correct facts will be asserted into Cyc KB.The work being done here is immediately useful as a tool that makes human knowledge entry faster , easier , and more effective, but it’s a pity that the queries generated by this mechanism are limitary. Only 233 predefined search templates for 134 binary predicates! It means that the knowledge we can gather was limited to focus on pre-selected 134 binary predicates. Finally, the question I ponder on is that why the experiment developed in the paper compares the non-sampled verified GAFs with the sampled unverified-GAFs, since we can trickily choose the best sampled unverified GAFs to strike out the great performance of the mechanism.

Paper report 2

Paper

D. B. Lenat. CYC: A Large-Scale Investment in Knowledge Infrastructure. Communication of the ACM, vol 38, pages 32-38, November, 1995.

Report

The paper introduce an AI project called CYC , which is ambitious to gather a comprehensive ontology of everyday common sense knowledge. CYC is a rule based system which contains millions of facts and rules describing common sense. The goal of CYC is to perform human-liked reasoning in AI applications such as enabling computer to understand an article and can also makes a conclusion about the article it read. Briefly, there are many powerful features and applications of CYC that seems have feasible usage for the semantic web in the future.

 

In order to make computer become more intelligent and can do simple inference as human, it is necessary to create an knowledge based system that enable computer to absorb knowledge and using inference engine to do inference. CYC attempt to gather common sense knowledge from human expert and formatted the knowledge it has gathered using CycL language, which is a representation of every assertion stored in CYC KB. CYC also has nature language processing feature so that it can parse articles and transform them into CycL representation to do inference. For example, after CYC have gone through some texts, it will de-contextualize the texts to see what the context the texts lie on is. Since different contexts will have different assertion and inference rule (we can see someone’s heart in a surgery, but we can not see someone’s heart in a classroom. Surgery and classroom are different contexts in this example). So the inference mechanism CYC use is to explicitly identify the different contexts and using the appropriate rule and assertions to do reasoning. Moreover, all assertions in CYC KB are assumed to be true by default because it’s hard to really construct a standard to check whether a common sense assertion is true or false. In my opinion, although it seems that CYC is so powerful, but there are many drawbacks like that CYC is lack of any meaningful benchmark or comparison for the efficiency of the inference engine, CYC’s inference is not complete , gathering knowledge from experts’ entered data is not efficient , etc. I think knowledge can be retrieved from the WWW using AI techniques in the attempts to reduce human efforts and increase system efficiency. The question I am pondering about is that common sense will change according to different cultures and nations, and change with time pass, how CYC deal with this kind of problems seems crucial in real world applications.

Paper report 1

Paper

A. Blum and M. Furst. Fast planning through planning graph analysis. In Proceedings of the International Joint Conference of Artificial Intelligence, pages 1636-1642, August 1995.

Report

The paper proposed a mechanism for solving planning problem using a compact structure called planning graph. The planner Graphplan used in this paradigm can always return the shortest plan or answers no plan existed for any planning problem. Moreover, it can also solve partial ordering planning problems. With the great power of having high performance and guarantee soundness and completeness, Graphplan beat many well-known planners in the world.

 

Planning problems can be described of using planner to finding a sequence of actions from an initial state to achieve a target goal. Since real world planning problems can be encoded into representations that computer can understand and handle, designing a high performance planner is crucial in many real applications. The mechanism the paper proposed can be described as follows. First, it will construct a data structure called planning graph which was composed of many stages expending level by level, while each stage is composed of propositional level and action level. Second, it will add constraints (mutual exclusion) on each level, including inconsistent effects, interference, competing needs and inconsistent support. It means that if a solution returned by Graphplan, then all actions can not obey the constraints defined in this scheme. Third, it will expand the graph level by level until our goal appears, and it will use backtracking to see whether or not a valid plan exists. Finally, if no valid plan exists in this stage, it will continue to expand the graph until leveled-off. If the graph is leveled-off and some literals of the goal do not appear or are marked as mutex in the latest proposition level, the problem is unsolvable. The proposed planner only required polynomial time and space to run its algorithm and is sound and complete. From the view of experimental results, it shows that Graphplan outperforms many total-order and partial-order planners in many well-known AI planning problems. The question I was pondering about is that why literals and actions will increase monotonically and mutexes will decrease monotonically in the planning graph, and eventually level off. Moreover, if we add time factor into the planning problem, I don’t know whether or not the planner can guarantee that running this algorithm is sound and complete.

別忘了 …

家人的期待

小時後的夢想

與小乖的甜美回憶

曾經困苦過的日子

我本該過著極端的生活

追求極致的刺激 快樂 悲傷

不應該放任自己沉浸在祥和的日子之中

還有很多事情還沒做好

家人是無法陪伴自己一輩子的

一切都得靠自己

醒醒吧

Research Issues

Netflix data set : 有好幾十億的資料,若要轉成 user-item matrix

做分析,則會因為 scale 太大而難以處理。

Solution : Apply SVD ( Singular Vector Decomposition) to the matrix.

SVD 可以把原始的 matrix 降至更低維度,並且保持一些性質 (待check)。

想法 : 幫每個movie定義一個 feature vactor,vector中的每個tuple代表

此movie某方面的性質,比如動作成分佔多高要素。相對的每個user也有

preference vector,vector中的每個tuple代表對movie中的每個性質之喜

好程度。

So let user A’s preference vector is = ( 1 , 2 , -1 )

let Movie M’s feature vector is = ( 1 , 4 , -1 )

then user A’s rating on Movie M is 1*1 + 2*4 + -1*-1 = 1 + 8 + 1 = 10

Paper Recommendation :

讓使用者把自己電腦中的 paper 交給 bibagent去取得 bibtex 後,並紀

錄每篇 paper在使用者電腦中的save time。如此就可以apply time weighted

技術來 identify目前user的working set是哪些領域的paper。為了不讓user

親自去對每篇paper做rating的動作,我們可以由網路上抓此篇paper的citation

number來當做客觀的rating,or we can do it in this way :

rating of a paper = citation number / published date

So , we can designe a web for user to upload the result of the retriving data made

by bibagent , then also let user to create the FOAF-like data sheet so that we can

use those infomation to apply CF-Algo .

Another point : 針對某篇 paper,我們是否能找出對此篇paper有興趣的user set,

so that we can get recommendations more specific to the paper from the user set .

類似找專家的概念。