作業爆多 光是寫個 paper report 就寫到快暈倒了
本來想去玩玩 netflix dataset
研究之路真的不好走 總覺得有點格式化 …
作業爆多 光是寫個 paper report 就寫到快暈倒了
本來想去玩玩 netflix dataset
研究之路真的不好走 總覺得有點格式化 …
C. Matuszek, M. Witbrock, R. Kahlert, J. Cabral, D. Schneider, P. Shah and D.B. Lenat. Searching for Common Sense: Populating Cyc from the Web. In Proceedings of the Twentieth National Conference on Artificial Intelligence, Pittsburgh, Pennsylvania, July 2005.
The paper proposed a mechanism for automating the process of gathering common sense for Cyc knowledge base from the world wide webs. In contrast to traditional method which Cyc KB gather knowledge from manually entering common sense by experts, the scheme the paper proposed is more efficient than the traditional one.
In the last twenty years, over 3 million facts and rules have been entered manually in the Cyc knowledge base by ontologists. It’s obvious that follow this traditional method to gather knowledge will take lots of efforts from human experts and not to be so efficient. Since world wild web had become more and more popular and were full of large amount of the human knowledge, so designing a mechanism to automatically retrieve common sense knowledge from WWW seems to be so feasible for the Cyc KB. The automating process of gathering knowledge for Cyc KB can be described as follows. First, we will choosing queries that can not be answered by Cyc KB and parse them into search string by using predefined templates. Second, pass all the queries strings for GOOGLE search and later we will get lots of search results. Third, we parse the search results to get the GAFs, then we will using inference to check the consistency between the old facts and the new facts (new GAFs) and discard the inconsistent ones. Later, we re-parse the consistent facts into search string for GOOGLE to check whether or not it can return search results, and we will discard the GAFs that return no search result. Finally , we pass the GOOGLE verified GAFs to human expert for reviewing for the correctness , and the correct facts will be asserted into Cyc KB.The work being done here is immediately useful as a tool that makes human knowledge entry faster , easier , and more effective, but it’s a pity that the queries generated by this mechanism are limitary. Only 233 predefined search templates for 134 binary predicates! It means that the knowledge we can gather was limited to focus on pre-selected 134 binary predicates. Finally, the question I ponder on is that why the experiment developed in the paper compares the non-sampled verified GAFs with the sampled unverified-GAFs, since we can trickily choose the best sampled unverified GAFs to strike out the great performance of the mechanism.
D. B. Lenat. CYC: A Large-Scale Investment in Knowledge Infrastructure. Communication of the ACM, vol 38, pages 32-38, November, 1995.
The paper introduce an AI project called CYC , which is ambitious to gather a comprehensive ontology of everyday common sense knowledge. CYC is a rule based system which contains millions of facts and rules describing common sense. The goal of CYC is to perform human-liked reasoning in AI applications such as enabling computer to understand an article and can also makes a conclusion about the article it read. Briefly, there are many powerful features and applications of CYC that seems have feasible usage for the semantic web in the future.
In order to make computer become more intelligent and can do simple inference as human, it is necessary to create an knowledge based system that enable computer to absorb knowledge and using inference engine to do inference. CYC attempt to gather common sense knowledge from human expert and formatted the knowledge it has gathered using CycL language, which is a representation of every assertion stored in CYC KB. CYC also has nature language processing feature so that it can parse articles and transform them into CycL representation to do inference. For example, after CYC have gone through some texts, it will de-contextualize the texts to see what the context the texts lie on is. Since different contexts will have different assertion and inference rule (we can see someone’s heart in a surgery, but we can not see someone’s heart in a classroom. Surgery and classroom are different contexts in this example). So the inference mechanism CYC use is to explicitly identify the different contexts and using the appropriate rule and assertions to do reasoning. Moreover, all assertions in CYC KB are assumed to be true by default because it’s hard to really construct a standard to check whether a common sense assertion is true or false. In my opinion, although it seems that CYC is so powerful, but there are many drawbacks like that CYC is lack of any meaningful benchmark or comparison for the efficiency of the inference engine, CYC’s inference is not complete , gathering knowledge from experts’ entered data is not efficient , etc. I think knowledge can be retrieved from the WWW using AI techniques in the attempts to reduce human efforts and increase system efficiency. The question I am pondering about is that common sense will change according to different cultures and nations, and change with time pass, how CYC deal with this kind of problems seems crucial in real world applications.
A. Blum and M. Furst. Fast planning through planning graph analysis. In Proceedings of the International Joint Conference of Artificial Intelligence, pages 1636-1642, August 1995.
The paper proposed a mechanism for solving planning problem using a compact structure called planning graph. The planner Graphplan used in this paradigm can always return the shortest plan or answers no plan existed for any planning problem. Moreover, it can also solve partial ordering planning problems. With the great power of having high performance and guarantee soundness and completeness, Graphplan beat many well-known planners in the world.
Planning problems can be described of using planner to finding a sequence of actions from an initial state to achieve a target goal. Since real world planning problems can be encoded into representations that computer can understand and handle, designing a high performance planner is crucial in many real applications. The mechanism the paper proposed can be described as follows. First, it will construct a data structure called planning graph which was composed of many stages expending level by level, while each stage is composed of propositional level and action level. Second, it will add constraints (mutual exclusion) on each level, including inconsistent effects, interference, competing needs and inconsistent support. It means that if a solution returned by Graphplan, then all actions can not obey the constraints defined in this scheme. Third, it will expand the graph level by level until our goal appears, and it will use backtracking to see whether or not a valid plan exists. Finally, if no valid plan exists in this stage, it will continue to expand the graph until leveled-off. If the graph is leveled-off and some literals of the goal do not appear or are marked as mutex in the latest proposition level, the problem is unsolvable. The proposed planner only required polynomial time and space to run its algorithm and is sound and complete. From the view of experimental results, it shows that Graphplan outperforms many total-order and partial-order planners in many well-known AI planning problems. The question I was pondering about is that why literals and actions will increase monotonically and mutexes will decrease monotonically in the planning graph, and eventually level off. Moreover, if we add time factor into the planning problem, I don’t know whether or not the planner can guarantee that running this algorithm is sound and complete.
追求極致的刺激 快樂 悲傷
Netflix data set : 有好幾十億的資料，若要轉成 user-item matrix
做分析，則會因為 scale 太大而難以處理。
Solution : Apply SVD ( Singular Vector Decomposition) to the matrix.
SVD 可以把原始的 matrix 降至更低維度，並且保持一些性質 (待check)。
想法 : 幫每個movie定義一個 feature vactor，vector中的每個tuple代表
So let user A’s preference vector is = ( 1 , 2 , -1 )
let Movie M’s feature vector is = ( 1 , 4 , -1 )
then user A’s rating on Movie M is 1*1 + 2*4 + -1*-1 = 1 + 8 + 1 = 10
Paper Recommendation :
讓使用者把自己電腦中的 paper 交給 bibagent去取得 bibtex 後，並紀
錄每篇 paper在使用者電腦中的save time。如此就可以apply time weighted
技術來 identify目前user的working set是哪些領域的paper。為了不讓user
number來當做客觀的rating，or we can do it in this way :
rating of a paper = citation number / published date
So , we can designe a web for user to upload the result of the retriving data made
by bibagent , then also let user to create the FOAF-like data sheet so that we can
use those infomation to apply CF-Algo .
Another point : 針對某篇 paper，我們是否能找出對此篇paper有興趣的user set，
so that we can get recommendations more specific to the paper from the user set .