最近,计划把推荐系统的几种方法一一用《集体智慧编程》这本书的代码实现。一来是为了自己更加熟练python语言的用法,然后可以更好的去理解这些推荐系统的概念。今天是第一个,是基于用户的推荐。怎么去理解呢?就是利用用户之间的关系去推荐。还不明白?那就引用《推荐系统实践》上说的。每当新学期,刚进实验室的学弟学妹总会问学长学姐们,应该需要去看哪些书?看哪些论文等等。我们作为学长学姐的肯定要告诉学弟学妹们应该看那些。第一,学弟学妹们信任我们,所以我们要负责任的去推荐;第二,他们跟我们是一个实验室的,基本方向和要学的东西是一样的,所以有共同的兴趣和爱好。聪明的你们肯定明白了什么事基于用户的推荐吧!
好的,原理说完,那就来代码吧!
# this is for user-based collaborative filter recommendation algorithms # i hope it can help us.i learn it from this book named #'programming collective intalligence' #if you have some question ,please let me know. #my email:[email protected] #my qq is 354475072 #my blog is http://blog.csdn.net/wbgxx333 from math import sqrt #computer distance between person1 and person2 def sim_distance(prefs,person1,person2): #get the list of shared_items si={} for item in prefs[person1]: if item in prefs[person2]: si[item]=1 # if they have no ratings in common, return 0 if len(si)==0: return 0 # Add up the squares of all the differences sum_of_squares=sum([pow(prefs[person1][item]-prefs[person2][item],2) for item in prefs[person1] if item in prefs[person2]]) return 1/(1+sum_of_squares) #compute the distance of pearson corrlation coefficient for p1 and p2 def sim_pearson(prefs,p1,p2): #get the list of mutally rated items si={} for item in prefs[p1]: for item in prefs[p2]: si[item]=1 #if they are no ratings in common ,return 0 if len(si)==0: return 0 #sum calculations n=len(si) #sums of the squares sum1=sum([prefs[p1][it] for it in si]) sum2=sum([prefs[p2][it] for it in si]) #sums of the squares sum1sq=sum([pow(prefs[p1][it],2) for it in si]) sum2sq=sum([pow(prefs[p2][it],2) for it in si]) #sum of the products psum=sum([prefs[p1][it]*prefs[p2][it] for it in si]) #calculate r(pearson score) num=psum-(sum1*sum2/n) den=sqrt((sum1sq-pow(sum1,2)/n)*(sum2sq-pow(sum2,2)/n)) if den==0: return 0 r=num/den return r #return the best mathes for person from the prefs dicyionary. #number of results and similarity function are optional params. def topmathes(prefs,person,n=5,similarity=sim_pearson): scores=[(similarity(prefs,person,other),other) for other in prefs if other!=person] scores.sort() scores.reverse() return scores[0:n] #get recommendations for a person by using a weighted average of #every other user's ranking def getrecommendations(prefs,person,similarity=sim_pearson): totals={} simsums={} for other in prefs: #donot compare me to myself if other==person: continue sim=similarity(prefs,person,other) #ignore scores of zero or lower if sim<=0:continue for item in prefs[other]: #only score movies i havenot seen yet if item not in prefs[person] or prefs[person][item]==0: #similarity *score totals.setdefault(item,0) totals[item]+=prefs[otner][item]*sim #sum of similarities simsums.setdefault(item,0) simsums[item]+=sim #creat the normalized list rankings=[(total/simsums[item],item) for item,total in totals.items()] #reurn the sorted list rankings.sort() rankings.reverse() return rankings def loadmovielens(path='D:/Python27/data'): #get movie titles movies={} for line in open(path+'/u.item'): (id,title)=line.split('|')[0:2] movies[id]=title # Load data prefs={} for line in open(path+'/u.data'): (user,movieid,rating,ts)=line.split('\t') prefs.setdefault(user,{}) prefs[user][movies[movieid]]=float(rating) return prefs
代码就在上面。具体的介绍可以去看《集体智慧编程》这本书。当然也可以问我。呵呵……
此外,需要说明的是这里只是简单的实现了这个功能。当遇到不同的数据时,你自己需要去看数据的格式,然后去加载数据。
附上实验结果:
>>> prefs=loadMovieLens()
>>> prefs['87']
{'Birdcage, The (1996)': 4.0, 'E.T. the Extra-Terrestrial (1982)': 3.0, 'Bananas (1971)': 5.0, 'Sting, The (1973)': 5.0, 'Bad Boys (1995)': 4.0, 'In the Line of Fire (1993)': 5.0, 'Star Trek: The Wrath of Khan (1982)': 5.0, 'Speechless (1994)': 4.0, 'Mission: Impossible (1996)': 4.0, 'Return of the Pink Panther, The (1974)': 4.0, 'Under Siege (1992)': 4.0, 'I.Q. (1994)': 5.0, 'Evil Dead II (1987)': 2.0, 'Heat (1995)': 3.0, 'Naked Gun 33 1/3: The Final Insult (1994)': 4.0, 'Star Trek III: The Search for Spock (1984)': 4.0, 'Executive Decision (1996)': 3.0, 'Endless Summer 2, The (1994)': 3.0, 'Serial Mom (1994)': 1.0, 'Butch Cassidy and the Sundance Kid (1969)': 5.0, 'GoldenEye (1995)': 4.0, 'Private Benjamin (1980)': 4.0, 'Boot, Das (1981)': 4.0, "City Slickers II: The Legend of Curly's Gold (1994)": 3.0, 'Heathers (1989)': 3.0, 'That Old Feeling (1997)': 4.0, 'Brady Bunch Movie, The (1995)': 2.0, 'Good, The Bad and The Ugly, The (1966)': 5.0, 'Down Periscope (1996)': 4.0, "Ulee's Gold (1997)": 3.0, 'Jeffrey (1995)': 3.0, 'Strange Days (1995)': 3.0, 'Dave (1993)': 4.0, 'Demolition Man (1993)': 3.0, 'Reality Bites (1994)': 3.0, 'Big Green, The (1995)': 3.0, 'Get Shorty (1995)': 5.0, 'Manchurian Candidate, The (1962)': 4.0, 'Batman & Robin (1997)': 4.0, 'Stargate (1994)': 5.0, 'Dead Man Walking (1995)': 4.0, 'Clear and Present Danger (1994)': 5.0, 'Net, The (1995)': 5.0, 'Ed Wood (1994)': 3.0, 'Fugitive, The (1993)': 5.0, 'Clockwork Orange, A (1971)': 4.0, 'Victor/Victoria (1982)': 4.0, "Joe's Apartment (1996)": 2.0, 'Magnificent Seven, The (1954)': 5.0, 'Star Wars (1977)': 5.0, 'To Die For (1995)': 3.0, 'Bridge on the River Kwai, The (1957)': 5.0, 'Maverick (1994)': 3.0, 'Full Metal Jacket (1987)': 4.0, 'Vegas Vacation (1997)': 4.0, 'Pulp Fiction (1994)': 4.0, 'Strictly Ballroom (1992)': 3.0, 'Days of Thunder (1990)': 5.0, 'Something to Talk About (1995)': 2.0, 'Son in Law (1993)': 4.0, 'That Thing You Do! (1996)': 4.0, "Schindler's List (1993)": 4.0, 'Tommy Boy (1995)': 4.0, 'Jimmy Hollywood (1994)': 3.0, 'Clueless (1995)': 4.0, 'Wizard of Oz, The (1939)': 5.0, 'Dances with Wolves (1990)': 5.0, 'Multiplicity (1996)': 3.0, 'Young Frankenstein (1974)': 5.0, 'Jack (1996)': 3.0, 'Big Squeeze, The (1996)': 2.0, 'Godfather, The (1972)': 4.0, 'Barcelona (1994)': 3.0, 'Milk Money (1994)': 4.0, 'Mrs. Doubtfire (1993)': 4.0, 'Cops and Robbersons (1994)': 3.0, 'So I Married an Axe Murderer (1993)': 2.0, 'Groundhog Day (1993)': 5.0, 'Four Weddings and a Funeral (1994)': 5.0, 'Home Alone (1990)': 4.0, 'Terminator 2: Judgment Day (1991)': 5.0, 'Boomerang (1992)': 3.0, 'Ace Ventura: Pet Detective (1994)': 4.0, 'Great White Hype, The (1996)': 3.0, 'Die Hard: With a Vengeance (1995)': 4.0, 'Fargo (1996)': 5.0, 'Fish Called Wanda, A (1988)': 5.0, 'Prefontaine (1997)': 5.0, 'Young Guns (1988)': 3.0, 'Empire Strikes Back, The (1980)': 5.0, 'Citizen Kane (1941)': 4.0, 'Dumb & Dumber (1994)': 4.0, 'Crow, The (1994)': 3.0, 'Swimming with Sharks (1995)': 3.0, '2001: A Space Odyssey (1968)': 5.0, 'Matilda (1996)': 3.0, 'Man of the House (1995)': 3.0, 'Star Trek: The Motion Picture (1979)': 3.0, 'Return of the Jedi (1983)': 5.0, 'Grumpier Old Men (1995)': 4.0, 'Jurassic Park (1993)': 5.0, 'Treasure of the Sierra Madre, The (1948)': 4.0, 'Renaissance Man (1994)': 5.0, 'Program, The (1993)': 3.0, "Monty Python's Life of Brian (1979)": 4.0, 'Sneakers (1992)': 4.0, 'Twister (1996)': 4.0, 'GoodFellas (1990)': 4.0, "Dante's Peak (1997)": 3.0, 'Adventures of Priscilla, Queen of the Desert, The (1994)': 3.0, 'Switchblade Sisters (1975)': 2.0, 'Dragonheart (1996)': 4.0, 'Lightning Jack (1994)': 3.0, 'River Wild, The (1994)': 4.0, 'Raiders of the Lost Ark (1981)': 5.0, 'Air Up There, The (1994)': 3.0, "Pyromaniac's Love Story, A (1995)": 3.0, 'Young Guns II (1990)': 2.0, 'Die Hard (1988)': 4.0, 'Top Gun (1986)': 5.0, 'Truth About Cats & Dogs, The (1996)': 4.0, 'While You Were Sleeping (1995)': 5.0, 'Braveheart (1995)': 4.0, 'Raising Arizona (1987)': 3.0, 'Batman (1989)': 3.0, 'To Kill a Mockingbird (1962)': 4.0, 'Mother (1996)': 2.0, 'Kingpin (1996)': 4.0, 'Supercop (1992)': 3.0, 'Dunston Checks In (1996)': 1.0, 'Deer Hunter, The (1978)': 3.0, 'Up in Smoke (1978)': 3.0, 'Cool Hand Luke (1967)': 5.0, 'Wyatt Earp (1994)': 3.0, 'Annie Hall (1977)': 4.0, 'Blues Brothers, The (1980)': 5.0, 'True Lies (1994)': 5.0, 'Independence Day (ID4) (1996)': 5.0, 'Professional, The (1994)': 4.0, "It's a Wonderful Life (1946)": 5.0, 'Blade Runner (1982)': 4.0, 'Low Down Dirty Shame, A (1994)': 3.0, 'Baby-Sitters Club, The (1995)': 2.0, 'Sabrina (1995)': 4.0, 'I Love Trouble (1994)': 3.0, 'Mask, The (1994)': 3.0, 'Indiana Jones and the Last Crusade (1989)': 5.0, 'Nine Months (1995)': 4.0, 'French Kiss (1995)': 5.0, 'Shawshank Redemption, The (1994)': 5.0, 'Batman Returns (1992)': 3.0, 'Addams Family Values (1993)': 2.0, 'Junior (1994)': 4.0, 'Adventures of Robin Hood, The (1938)': 5.0, 'Mars Attacks! (1996)': 3.0, 'Waterworld (1995)': 4.0, 'Major Payne (1994)': 3.0, 'Con Air (1997)': 4.0, 'Sleepers (1996)': 4.0, 'Air Force One (1997)': 3.0, 'Alien (1979)': 4.0, 'Nutty Professor, The (1996)': 4.0, 'Coneheads (1993)': 4.0, 'Raging Bull (1980)': 3.0, "Singin' in the Rain (1952)": 4.0, 'In the Army Now (1994)': 4.0, 'Glory (1989)': 4.0, 'Star Trek IV: The Voyage Home (1986)': 5.0, 'Forget Paris (1995)': 4.0, 'M*A*S*H (1970)': 5.0, 'Platoon (1986)': 3.0, 'House Arrest (1996)': 3.0, 'Speed 2: Cruise Control (1997)': 3.0, 'Terminator, The (1984)': 5.0, 'To Wong Foo, Thanks for Everything! Julie Newmar (1995)': 3.0, 'Cliffhanger (1993)': 3.0, 'Speed (1994)': 5.0, 'Desperado (1995)': 3.0, 'Michael (1996)': 4.0, 'Conan the Barbarian (1981)': 3.0, 'Hoop Dreams (1994)': 4.0, 'Mighty Aphrodite (1995)': 3.0, 'Twelve Monkeys (1995)': 4.0, 'Sleepless in Seattle (1993)': 5.0, 'My Favorite Year (1982)': 3.0, 'Sleeper (1973)': 4.0, 'Searching for Bobby Fischer (1993)': 4.0, 'Apocalypse Now (1979)': 4.0, 'Addicted to Love (1997)': 4.0, 'Hot Shots! Part Deux (1993)': 4.0, 'Quiet Man, The (1952)': 5.0, 'Babe (1995)': 5.0, 'When Harry Met Sally... (1989)': 5.0, 'Star Trek: First Contact (1996)': 4.0, 'American President, The (1995)': 5.0, 'Shadow, The (1994)': 3.0, 'Muppet Treasure Island (1996)': 3.0, 'Santa Clause, The (1994)': 4.0, 'Dead Poets Society (1989)': 5.0, 'First Wives Club, The (1996)': 2.0, 'Lost World: Jurassic Park, The (1997)': 3.0, 'Inkwell, The (1994)': 3.0, 'Broken Arrow (1996)': 3.0, 'Hard Target (1993)': 4.0, 'Grease (1978)': 4.0, 'This Is Spinal Tap (1984)': 5.0, 'Back to the Future (1985)': 5.0, "Weekend at Bernie's (1989)": 3.0, 'Cowboy Way, The (1994)': 3.0, 'Striptease (1996)': 2.0}
>>> getRecommendations(prefs,'87')[0:30]
[(5.0, 'They Made Me a Criminal (1939)'), (5.0, 'Star Kid (1997)'), (5.0, 'Santa with Muscles (1996)'), (5.0, 'Saint of Fort Washington, The (1993)'), (5.0, 'Marlene Dietrich: Shadow and Light (1996) '), (5.0, 'Great Day in Harlem, A (1994)'), (5.0, 'Entertaining Angels: The Dorothy Day Story (1996)'), (5.0, 'Boys, Les (1997)'), (4.89884443128923, 'Legal Deceit (1997)'), (4.815019082242709, 'Letter From Death Row, A (1998)'), (4.7321082983941425, 'Hearts and Minds (1996)'), (4.696244466490867, 'Pather Panchali (1955)'), (4.652397061026758, 'Lamerica (1994)'), (4.538723693474813, 'Leading Man, The (1996)'), (4.535081339106103, 'Mrs. Dalloway (1997)'), (4.532337612572981, 'Innocents, The (1961)'), (4.527998574747079, 'Casablanca (1942)'), (4.510270149719864, 'Everest (1998)'), (4.493967755428439, 'Dangerous Beauty (1998)'), (4.485151301801342, 'Wallace & Gromit: The Best of Aardman Animation (1996)'), (4.463287461290222, 'Wrong Trousers, The (1993)'), (4.450979436941035, 'Kaspar Hauser (1993)'), (4.431079071179518, 'Usual Suspects, The (1995)'), (4.427520682864959, 'Maya Lin: A Strong Clear Vision (1994)'), (4.414870784592075, 'Wedding Gift, The (1994)'), (4.377445252656464, 'Affair to Remember, An (1957)'), (4.376071110447771, 'Good Will Hunting (1997)'), (4.376011099001396, 'As Good As It Gets (1997)'), (4.374146179500976, 'Anna (1996)'), (4.367437266504598, 'Close Shave, A (1995)')]
好了,欢迎指正。