pyspark的使用和操作(基础整理)

Spark提供了一个Python_Shell,即pyspark,从而可以以交互的方式使用Python编写Spark程序。
有关Spark的基本架构介绍参考http://blog.csdn.net/cymy001/article/details/78483614;
有关Pyspark的环境配置参考http://blog.csdn.net/cymy001/article/details/78430892。

pyspark里最核心的模块是SparkContext(简称sc),最重要的数据载体是RDD。RDD就像一个NumPy array或者一个Pandas Series,可以视作一个有序的item集合。只不过这些item并不存在driver端的内存里,而是被分割成很多个partitions,每个partition的数据存在集群的executor的内存中。

引入Python中pyspark工作模块

import pyspark
from pyspark import SparkContext as sc
from pyspark import SparkConf
conf=SparkConf().setAppName("miniProject").setMaster("local[*]")
sc=SparkContext.getOrCreate(conf)
#任何Spark程序都是SparkContext开始的,SparkContext的初始化需要一个SparkConf对象,SparkConf包含了Spark集群配置的各种参数(比如主节点的URL)。初始化后,就可以使用SparkContext对象所包含的各种方法来创建和操作RDD和共享变量。Spark shell会自动初始化一个SparkContext(在Scala和Python下可以,但不支持Java)。
#getOrCreate表明可以视情况新建session或利用已有的session

SparkSession是Spark 2.0引入的新概念。SparkSession为用户提供了统一的切入点,来让用户学习spark的各项功能。 在spark的早期版本中,SparkContext是spark的主要切入点,由于RDD是主要的API,我们通过sparkcontext来创建和操作RDD。对于每个其他的API,我们需要使用不同的context。例如,对于Streming,我们需要使用StreamingContext对于sql,使用sqlContext对于hive,使用hiveContext。但是随着DataSet和DataFrame的API逐渐成为标准的API,就需要为他们建立接入点。所以在spark2.0中,引入SparkSession作为DataSet和DataFrame API的切入点。SparkSession实质上是SQLContext和HiveContext的组合(未来可能还会加上StreamingContext),所以在SQLContext和HiveContext上可用的API在SparkSession上同样是可以使用的。SparkSession内部封装了SparkContext,所以计算实际上是由SparkContext完成的。

初始化RDD的方法

(1)本地内存中已经有一份序列数据(比如python的list),可以通过sc.parallelize去初始化一个RDD。当执行这个操作以后,list中的元素将被自动分块(partitioned),并且把每一块送到集群上的不同机器上。

import pyspark
from pyspark import SparkContext as sc
from pyspark import SparkConf
conf=SparkConf().setAppName("miniProject").setMaster("local[*]")
sc=SparkContext.getOrCreate(conf)

#(a)利用list创建一个RDD;使用sc.parallelize可以把Python list,NumPy array或者Pandas Series,Pandas DataFrame转成Spark RDD。
rdd = sc.parallelize([1,2,3,4,5])
rdd
#Output:ParallelCollectionRDD[0] at parallelize at PythonRDD.scala:480

#(b)getNumPartitions()方法查看list被分成了几部分
rdd.getNumPartitions() 
#Output:4
#(c)glom().collect()查看分区状况
rdd.glom().collect()
#Output:[[1], [2], [3], [4, 5]]

在这个例子中,是一个4-core的CPU笔记本;Spark创建了4个executor,然后把数据分成4个块。colloect()方法很危险,数据量上BT文件读入会爆掉内存……

(2)创建RDD的另一个方法是直接把文本读到RDD。文本的每一行都会被当做一个item,不过需要注意的一点是,Spark一般默认给定的路径是指向HDFS的,如果要从本地读取文件的话,给一个file://开头(windows下是以file:\\开头)的全局路径。

import pyspark
from pyspark import SparkContext as sc
from pyspark import SparkConf
conf=SparkConf().setAppName("miniProject").setMaster("local[*]")
sc=SparkContext.getOrCreate(conf)

#(a)记录当前pyspark工作环境位置
import os
cwd=os.getcwd()  
cwd
#Output:'C:\\Users\\Yu\\0JulyLearn\\5weekhadoopspark'

#(b)要读入的文件的全路径
rdd=sc.textFile("file:\\\\\\" + cwd + "\\names\yob1880.txt")
rdd
#Output:file:\\\C:\Users\Yu\0JulyLearn\5weekhadoopspark\names\yob1880.txt MapPartitionsRDD[3] at textFile at NativeMethodAccessorImpl.java:0

#(c)first()方法取读入的rdd数据第一个item
rdd.first()
#Output:'Mary,F,7065'

甚至可以sc.wholeTextFiles读入整个文件夹的所有文件。但是要特别注意,这种读法,RDD中的每个item实际上是一个形如(文件名,文件所有内容)的元组。读入整个文件夹的所有文件。

import pyspark
from pyspark import SparkContext as sc
from pyspark import SparkConf
conf=SparkConf().setAppName("miniProject").setMaster("local[*]")
sc=SparkContext.getOrCreate(conf)

#记录当前pyspark工作环境位置
import os
cwd=os.getcwd()  
cwd
#Output:'C:\\Users\\Yu\\0JulyLearn\\5weekhadoopspark'

rdd = sc.wholeTextFiles("file:\\\\\\" + cwd + "\\names\yob1880.txt")
rdd
#Output:org.apache.spark.api.java.JavaPairRDD@12bcc15
rdd.first()

Output:
('file:/C:/Users/Yu/0JulyLearn/5weekhadoopspark/names/yob1880.txt',
 'Mary,F,7065\r\nAnna,F,2604\r\nEmma,F,2003\r\nElizabeth,F,1939\r\nMinnie,F,1746\r\nMargaret,F,1578\r\nIda,F,1472\r\nAlice,F,1414\r\nBertha,F,1320\r\nSarah,F,1288\r\nAnnie,F,1258\r\nClara,F,1226\r\nElla,F,1156\r\nFlorence,F,1063\r\nCora,F,1045\r\nMartha,F,1040\r\nLaura,F,1012\r\nNellie,F,995\r\nGrace,F,982\r\nCarrie,F,949\r\nMaude,F,858\r\nMabel,F,808\r\nBessie,F,796\r\nJennie,F,793\r\nGertrude,F,787\r\nJulia,F,783\r\nHattie,F,769\r\nEdith,F,768\r\nMattie,F,704\r\nRose,F,700\r\nCatherine,F,688\r\nLillian,F,672\r\nAda,F,652\r\nLillie,F,647\r\nHelen,F,636\r\nJessie,F,635\r\nLouise,F,635\r\nEthel,F,633\r\nLula,F,621\r\nMyrtle,F,615\r\nEva,F,614\r\nFrances,F,605\r\nLena,F,603\r\nLucy,F,590\r\nEdna,F,588\r\nMaggie,F,582\r\nPearl,F,569\r\nDaisy,F,564\r\nFannie,F,560\r\nJosephine,F,544\r\nDora,F,524\r\nRosa,F,507\r\nKatherine,F,502\r\nAgnes,F,473\r\nMarie,F,471\r\nNora,F,471\r\nMay,F,462\r\nMamie,F,436\r\nBlanche,F,427\r\nStella,F,414\r\nEllen,F,411\r\nNancy,F,411\r\nEffie,F,406\r\nSallie,F,404\r\nNettie,F,403\r\nDella,F,391\r\nLizzie,F,388\r\nFlora,F,365\r\nSusie,F,361\r\nMaud,F,345\r\nMae,F,344\r\nEtta,F,323\r\nHarriet,F,319\r\nSadie,F,317\r\nCaroline,F,306\r\nKatie,F,303\r\nLydia,F,302\r\nElsie,F,301\r\nKate,F,299\r\nSusan,F,286\r\nMollie,F,283\r\nAlma,F,277\r\nAddie,F,274\r\nGeorgia,F,259\r\nEliza,F,252\r\nLulu,F,249\r\nNannie,F,248\r\nLottie,F,245\r\nAmanda,F,241\r\nBelle,F,238\r\nCharlotte,F,237\r\nRebecca,F,236\r\nRuth,F,234\r\nViola,F,229\r\nOlive,F,224\r\nAmelia,F,221\r\nHannah,F,221\r\nJane,F,215\r\nVirginia,F,213\r\nEmily,F,210\r\nMatilda,F,210\r\nIrene,F,204\r\nKathryn,F,204\r\nEsther,F,198\r\nWillie,F,192\r\nHenrietta,F,191\r\nOllie,F,183\r\nAmy,F,167\r\nRachel,F,166\r\nSara,F,165\r\nEstella,F,162\r\nTheresa,F,153\r\nAugusta,F,151\r\nOra,F,149\r\nPauline,F,144\r\nJosie,F,141\r\nLola,F,138\r\nSophia,F,138\r\nLeona,F,137\r\nAnne,F,136\r\nMildred,F,132\r\nAnn,F,131\r\nBeulah,F,131\r\nCallie,F,131\r\nLou,F,131\r\nDelia,F,129\r\nEleanor,F,129\r\nBarbara,F,127\r\nIva,F,127\r\nLouisa,F,126\r\nMaria,F,125\r\nMayme,F,124\r\nEvelyn,F,122\r\nEstelle,F,119\r\nNina,F,119\r\nBetty,F,117\r\nMarion,F,115\r\nBettie,F,113\r\nDorothy,F,112\r\nLuella,F,111\r\nInez,F,106\r\nLela,F,106\r\nRosie,F,106\r\nAllie,F,105\r\nMillie,F,105\r\nJanie,F,96\r\nCornelia,F,94\r\nVictoria,F,93\r\nRuby,F,92\r\nWinifred,F,92\r\nAlta,F,91\r\nCelia,F,90\r\nChristine,F,89\r\nBeatrice,F,87\r\nBirdie,F,85\r\nHarriett,F,83\r\nMable,F,83\r\nMyra,F,83\r\nSophie,F,83\r\nTillie,F,83\r\nIsabel,F,81\r\nSylvia,F,81\r\nCarolyn,F,80\r\nIsabelle,F,80\r\nLeila,F,80\r\nSally,F,80\r\nIna,F,79\r\nEssie,F,78\r\nBertie,F,77\r\nNell,F,77\r\nAlberta,F,76\r\nKatharine,F,76\r\nLora,F,74\r\nRena,F,74\r\nMina,F,73\r\nRhoda,F,73\r\nMathilda,F,72\r\nAbbie,F,71\r\nEula,F,70\r\nDollie,F,69\r\nHettie,F,69\r\nEunice,F,67\r\nFanny,F,67\r\nOla,F,67\r\nLenora,F,66\r\nAdelaide,F,65\r\nChristina,F,65\r\nLelia,F,65\r\nNelle,F,65\r\nSue,F,65\r\nJohanna,F,64\r\nLilly,F,64\r\nLucinda,F,63\r\nMinerva,F,63\r\nLettie,F,62\r\nRoxie,F,62\r\nCynthia,F,61\r\nHelena,F,60\r\nHilda,F,60\r\nHulda,F,60\r\nBernice,F,59\r\nGenevieve,F,59\r\nJean,F,59\r\nCordelia,F,58\r\nMarian,F,56\r\nFrancis,F,55\r\nJeanette,F,55\r\nAdeline,F,54\r\nGussie,F,54\r\nLeah,F,54\r\nLois,F,53\r\nLura,F,53\r\nMittie,F,53\r\nHallie,F,51\r\nIsabella,F,50\r\nOlga,F,50\r\nPhoebe,F,50\r\nTeresa,F,50\r\nHester,F,49\r\nLida,F,49\r\nLina,F,49\r\nWinnie,F,49\r\nClaudia,F,48\r\nMarguerite,F,48\r\nVera,F,48\r\nCecelia,F,47\r\nBess,F,46\r\nEmilie,F,46\r\nJohn,F,46\r\nRosetta,F,46\r\nVerna,F,46\r\nMyrtie,F,45\r\nCecilia,F,44\r\nElva,F,44\r\nOlivia,F,44\r\nOphelia,F,44\r\nGeorgie,F,43\r\nElnora,F,42\r\nViolet,F,42\r\nAdele,F,41\r\nLily,F,41\r\nLinnie,F,41\r\nLoretta,F,41\r\nMadge,F,41\r\nPolly,F,41\r\nVirgie,F,41\r\nEugenia,F,40\r\nLucile,F,40\r\nLucille,F,40\r\nMabelle,F,39\r\nRosalie,F,39\r\nKittie,F,38\r\nMeta,F,37\r\nAngie,F,36\r\nDessie,F,36\r\nGeorgiana,F,36\r\nLila,F,36\r\nRegina,F,36\r\nSelma,F,36\r\nWilhelmina,F,36\r\nBridget,F,35\r\nLilla,F,35\r\nMalinda,F,35\r\nVina,F,35\r\nFreda,F,34\r\nGertie,F,34\r\nJeannette,F,34\r\nLouella,F,34\r\nMandy,F,34\r\nRoberta,F,34\r\nCassie,F,33\r\nCorinne,F,33\r\nIvy,F,33\r\nMelissa,F,33\r\nLyda,F,32\r\nNaomi,F,32\r\nNorma,F,32\r\nBell,F,31\r\nMargie,F,31\r\nNona,F,31\r\nZella,F,31\r\nDovie,F,30\r\nElvira,F,30\r\nErma,F,30\r\nIrma,F,30\r\nLeota,F,30\r\nWilliam,F,30\r\nArtie,F,29\r\nBlanch,F,29\r\nCharity,F,29\r\nLorena,F,29\r\nLucretia,F,29\r\nOrpha,F,29\r\nAlvina,F,28\r\nAnnette,F,28\r\nCatharine,F,28\r\nElma,F,28\r\nGeneva,F,28\r\nJanet,F,28\r\nLee,F,28\r\nLeora,F,28\r\nLona,F,28\r\nMiriam,F,28\r\nZora,F,28\r\nLinda,F,27\r\nOctavia,F,27\r\nSudie,F,27\r\nZula,F,27\r\nAdella,F,26\r\nAlpha,F,26\r\nFrieda,F,26\r\nGeorge,F,26\r\nJoanna,F,26\r\nLeonora,F,26\r\nPriscilla,F,26\r\nTennie,F,26\r\nAngeline,F,25\r\nDocia,F,25\r\nEttie,F,25\r\nFlossie,F,25\r\nHanna,F,25\r\nLetha,F,25\r\nMinta,F,25\r\nRetta,F,25\r\nRosella,F,25\r\nAdah,F,24\r\nBerta,F,24\r\nElisabeth,F,24\r\nElise,F,24\r\nGoldie,F,24\r\nLeola,F,24\r\nMargret,F,24\r\nAdaline,F,23\r\nFloy,F,23\r\nIdella,F,23\r\nJuanita,F,23\r\nLenna,F,23\r\nLucie,F,23\r\nMissouri,F,23\r\nNola,F,23\r\nZoe,F,23\r\nEda,F,22\r\nIsabell,F,22\r\nJames,F,22\r\nJulie,F,22\r\nLetitia,F,22\r\nMadeline,F,22\r\nMalissa,F,22\r\nMariah,F,22\r\nPattie,F,22\r\nVivian,F,22\r\nAlmeda,F,21\r\nAurelia,F,21\r\nClaire,F,21\r\nDolly,F,21\r\nHazel,F,21\r\nJannie,F,21\r\nKathleen,F,21\r\nKathrine,F,21\r\nLavinia,F,21\r\nMarietta,F,21\r\nMelvina,F,21\r\nOna,F,21\r\nPinkie,F,21\r\nSamantha,F,21\r\nSusanna,F,21\r\nChloe,F,20\r\nDonnie,F,20\r\nElsa,F,20\r\nGladys,F,20\r\nMatie,F,20\r\nPearle,F,20\r\nVesta,F,20\r\nVinnie,F,20\r\nAntoinette,F,19\r\nClementine,F,19\r\nEdythe,F,19\r\nHarriette,F,19\r\nLibbie,F,19\r\nLilian,F,19\r\nLue,F,19\r\nLutie,F,19\r\nMagdalena,F,19\r\nMeda,F,19\r\nRita,F,19\r\nTena,F,19\r\nZelma,F,19\r\nAdelia,F,18\r\nAnnetta,F,18\r\nAntonia,F,18\r\nDona,F,18\r\nElizebeth,F,18\r\nGeorgianna,F,18\r\nGracie,F,18\r\nIona,F,18\r\nLessie,F,18\r\nLeta,F,18\r\nLiza,F,18\r\nMertie,F,18\r\nMolly,F,18\r\nNeva,F,18\r\nOma,F,18\r\nAlida,F,17\r\nAlva,F,17\r\nCecile,F,17\r\nCleo,F,17\r\nDonna,F,17\r\nEllie,F,17\r\nErnestine,F,17\r\nEvie,F,17\r\nFrankie,F,17\r\nHelene,F,17\r\nMinna,F,17\r\nMyrta,F,17\r\nPrudence,F,17\r\nQueen,F,17\r\nRilla,F,17\r\nSavannah,F,17\r\nTessie,F,17\r\nTina,F,17\r\nAgatha,F,16\r\nAmerica,F,16\r\nAnita,F,16\r\nArminta,F,16\r\nDorothea,F,16\r\nIra,F,16\r\nLuvenia,F,16\r\nMarjorie,F,16\r\nMaybelle,F,16\r\nMellie,F,16\r\nNan,F,16\r\nPearlie,F,16\r\nSidney,F,16\r\nVelma,F,16\r\nClare,F,15\r\nConstance,F,15\r\nDixie,F,15\r\nIla,F,15\r\nIola,F,15\r\nJimmie,F,15\r\nLouvenia,F,15\r\nLucia,F,15\r\nLudie,F,15\r\nLuna,F,15\r\nMetta,F,15\r\nPatsy,F,15\r\nPhebe,F,15\r\nSophronia,F,15\r\nAdda,F,14\r\nAvis,F,14\r\nBetsy,F,14\r\nBonnie,F,14\r\nCecil,F,14\r\nCordie,F,14\r\nEmmaline,F,14\r\nEthelyn,F,14\r\nHortense,F,14\r\nJune,F,14\r\nLouie,F,14\r\nLovie,F,14\r\nMarcella,F,14\r\nMelinda,F,14\r\nMona,F,14\r\nOdessa,F,14\r\nVeronica,F,14\r\nAimee,F,13\r\nAnnabel,F,13\r\nAva,F,13\r\nBella,F,13\r\nCarolina,F,13\r\nCathrine,F,13\r\nChristena,F,13\r\nClyde,F,13\r\nDena,F,13\r\nDolores,F,13\r\nEleanore,F,13\r\nElmira,F,13\r\nFay,F,13\r\nFrank,F,13\r\nJenny,F,13\r\nKizzie,F,13\r\nLonnie,F,13\r\nLoula,F,13\r\nMagdalene,F,13\r\nMettie,F,13\r\nMintie,F,13\r\nPeggy,F,13\r\nReba,F,13\r\nSerena,F,13\r\nVida,F,13\r\nZada,F,13\r\nAbigail,F,12\r\nCelestine,F,12\r\nCelina,F,12\r\nClaudie,F,12\r\nClemmie,F,12\r\nConnie,F,12\r\nDaisie,F,12\r\nDeborah,F,12\r\nDessa,F,12\r\nEaster,F,12\r\nEddie,F,12\r\nEmelia,F,12\r\nEmmie,F,12\r\nImogene,F,12\r\nIndia,F,12\r\nJeanne,F,12\r\nJoan,F,12\r\nLenore,F,12\r\nLiddie,F,12\r\nLotta,F,12\r\nMame,F,12\r\nNevada,F,12\r\nRachael,F,12\r\nSina,F,12\r\nWilla,F,12\r\nAline,F,11\r\nBeryl,F,11\r\nCharles,F,11\r\nDaisey,F,11\r\nDorcas,F,11\r\nEdmonia,F,11\r\nEffa,F,11\r\nEldora,F,11\r\nEloise,F,11\r\nEmmer,F,11\r\nEra,F,11\r\nGena,F,11\r\nHenry,F,11\r\nIris,F,11\r\nIzora,F,11\r\nLennie,F,11\r\nLissie,F,11\r\nMallie,F,11\r\nMalvina,F,11\r\nMathilde,F,11\r\nMazie,F,11\r\nQueenie,F,11\r\nRobert,F,11\r\nRosina,F,11\r\nSalome,F,11\r\nTheodora,F,11\r\nTherese,F,11\r\nVena,F,11\r\nWanda,F,11\r\nWilda,F,11\r\nAltha,F,10\r\nAnastasia,F,10\r\nBesse,F,10\r\nBird,F,10\r\nBirtie,F,10\r\nClarissa,F,10\r\nClaude,F,10\r\nDelilah,F,10\r\nDiana,F,10\r\nEmelie,F,10\r\nErna,F,10\r\nFern,F,10\r\nFlorida,F,10\r\nFrona,F,10\r\nHilma,F,10\r\nJoseph,F,10\r\nJuliet,F,10\r\nLeonie,F,10\r\nLugenia,F,10\r\nMammie,F,10\r\nManda,F,10\r\nManerva,F,10\r\nManie,F,10\r\nNella,F,10\r\nPaulina,F,10\r\nPhilomena,F,10\r\nRae,F,10\r\nSelina,F,10\r\nSena,F,10\r\nTheodosia,F,10\r\nTommie,F,10\r\nUna,F,10\r\nVernie,F,10\r\nAdela,F,9\r\nAlthea,F,9\r\nAmalia,F,9\r\nAmber,F,9\r\nAngelina,F,9\r\nAnnabelle,F,9\r\nAnner,F,9\r\nArie,F,9\r\nClarice,F,9\r\nCorda,F,9\r\nCorrie,F,9\r\nDell,F,9\r\nDellar,F,9\r\nDonie,F,9\r\nDoris,F,9\r\nElda,F,9\r\nElinor,F,9\r\nEmeline,F,9\r\nEmilia,F,9\r\nEsta,F,9\r\nEstell,F,9\r\nEtha,F,9\r\nFred,F,9\r\nHope,F,9\r\nIndiana,F,9\r\nIone,F,9\r\nJettie,F,9\r\nJohnnie,F,9\r\nJosiephine,F,9\r\nKitty,F,9\r\nLavina,F,9\r\nLeda,F,9\r\nLetta,F,9\r\nMahala,F,9\r\nMarcia,F,9\r\nMargarette,F,9\r\nMaudie,F,9\r\nMaye,F,9\r\nNorah,F,9\r\nOda,F,9\r\nPatty,F,9\r\nPaula,F,9\r\nPermelia,F,9\r\nRosalia,F,9\r\nRoxanna,F,9\r\nSula,F,9\r\nVada,F,9\r\nWinnifred,F,9\r\nAdline,F,8\r\nAlmira,F,8\r\nAlvena,F,8\r\nArizona,F,8\r\nBecky,F,8\r\nBennie,F,8\r\nBernadette,F,8\r\nCamille,F,8\r\nCordia,F,8\r\nCorine,F,8\r\nDicie,F,8\r\nDove,F,8\r\nDrusilla,F,8\r\nElena,F,8\r\nElenora,F,8\r\nElmina,F,8\r\nEthyl,F,8\r\nEvalyn,F,8\r\nEvelina,F,8\r\nFaye,F,8\r\nHuldah,F,8\r\nIdell,F,8\r\nInga,F,8\r\nIrena,F,8\r\nJewell,F,8\r\nKattie,F,8\r\nLavenia,F,8\r\nLeslie,F,8\r\nLovina,F,8\r\nLulie,F,8\r\nMagnolia,F,8\r\nMargeret,F,8\r\nMargery,F,8\r\nMedia,F,8\r\nMillicent,F,8\r\nNena,F,8\r\nOcie,F,8\r\nOrilla,F,8\r\nOsie,F,8\r\nPansy,F,8\r\nRay,F,8\r\nRosia,F,8\r\nRowena,F,8\r\nShirley,F,8\r\nTabitha,F,8\r\nThomas,F,8\r\nVerdie,F,8\r\nWalter,F,8\r\nZetta,F,8\r\nZoa,F,8\r\nZona,F,8\r\nAlbertina,F,7\r\nAlbina,F,7\r\nAlyce,F,7\r\nAmie,F,7\r\nAngela,F,7\r\nAnnis,F,7\r\nCarol,F,7\r\nCarra,F,7\r\nClarence,F,7\r\nClarinda,F,7\r\nDelphia,F,7\r\nDillie,F,7\r\nDoshie,F,7\r\nDrucilla,F,7\r\nEtna,F,7\r\nEugenie,F,7\r\nEulalia,F,7\r\nEve,F,7\r\nFelicia,F,7\r\nFlorance,F,7\r\nFronie,F,7\r\nGeraldine,F,7\r\nGina,F,7\r\nGlenna,F,7\r\nGrayce,F,7\r\nHedwig,F,7\r\nJessica,F,7\r\nJossie,F,7\r\nKatheryn,F,7\r\nKaty,F,7\r\nLea,F,7\r\nLeanna,F,7\r\nLeitha,F,7\r\nLeone,F,7\r\nLidie,F,7\r\nLoma,F,7\r\nLular,F,7\r\nMagdalen,F,7\r\nMaymie,F,7\r\nMinervia,F,7\r\nMuriel,F,7\r\nNeppie,F,7\r\nOlie,F,7\r\nOnie,F,7\r\nOsa,F,7\r\nOtelia,F,7\r\nParalee,F,7\r\nPatience,F,7\r\nRella,F,7\r\nRillie,F,7\r\nRosanna,F,7\r\nTheo,F,7\r\nTilda,F,7\r\nTishie,F,7\r\nTressa,F,7\r\nViva,F,7\r\nYetta,F,7\r\nZena,F,7\r\nZola,F,7\r\nAbby,F,6\r\nAileen,F,6\r\nAlba,F,6\r\nAlda,F,6\r\nAlla,F,6\r\nAlverta,F,6\r\nAra,F,6\r\nArdelia,F,6\r\nArdella,F,6\r\nArrie,F,6\r\nArvilla,F,6\r\nAugustine,F,6\r\nAurora,F,6\r\nBama,F,6\r\nBena,F,6\r\nByrd,F,6\r\nCalla,F,6\r\nCamilla,F,6\r\nCarey,F,6\r\nCarlotta,F,6\r\nCelestia,F,6\r\nCherry,F,6\r\nCinda,F,6\r\nClassie,F,6\r\nClaudine,F,6\r\nClemie,F,6\r\nClifford,F,6\r\nClyda,F,6\r\nCreola,F,6\r\nDebbie,F,6\r\nDee,F,6\r\nDinah,F,6\r\nDoshia,F,6\r\nEdnah,F,6\r\nEdyth,F,6\r\nEleanora,F,6\r\nElecta,F,6\r\nEola,F,6\r\nErie,F,6\r\nEudora,F,6\r\nEuphemia,F,6\r\nEvalena,F,6\r\nEvaline,F,6\r\nFaith,F,6\r\nFidelia,F,6\r\nFreddie,F,6\r\nGolda,F,6\r\nHarry,F,6\r\nHelma,F,6\r\nHermine,F,6\r\nHessie,F,6\r\nIvah,F,6\r\nJanette,F,6\r\nJennette,F,6\r\nJoella,F,6\r\nKathryne,F,6\r\nLacy,F,6\r\nLanie,F,6\r\nLauretta,F,6\r\nLeana,F,6\r\nLeatha,F,6\r\nLeo,F,6\r\nLiller,F,6\r\nLillis,F,6\r\nLouetta,F,6\r\nMadie,F,6\r\nMai,F,6\r\nMartina,F,6\r\nMaryann,F,6\r\nMelva,F,6\r\nMena,F,6\r\nMercedes,F,6\r\nMerle,F,6\r\nMima,F,6\r\nMinda,F,6\r\nMonica,F,6\r\nNealie,F,6\r\nNetta,F,6\r\nNolia,F,6\r\nNonie,F,6\r\nOdelia,F,6\r\nOttilie,F,6\r\nPhyllis,F,6\r\nRobbie,F,6\r\nSabina,F,6\r\nSada,F,6\r\nSammie,F,6\r\nSuzanne,F,6\r\nSybilla,F,6\r\nThea,F,6\r\nTressie,F,6\r\nVallie,F,6\r\nVenie,F,6\r\nViney,F,6\r\nWilhelmine,F,6\r\nWinona,F,6\r\nZelda,F,6\r\nZilpha,F,6\r\nAdelle,F,5\r\nAdina,F,5\r\nAdrienne,F,5\r\nAlbertine,F,5\r\nAlys,F,5\r\nAna,F,5\r\nAraminta,F,5\r\nArthur,F,5\r\nBirtha,F,5\r\nBulah,F,5\r\nCaddie,F,5\r\nCelie,F,5\r\nCharlotta,F,5\r\nClair,F,5\r\nConcepcion,F,5\r\nCordella,F,5\r\nCorrine,F,5\r\nDelila,F,5\r\nDelphine,F,5\r\nDosha,F,5\r\nEdgar,F,5\r\nElaine,F,5\r\nElisa,F,5\r\nEllar,F,5\r\nElmire,F,5\r\nElvina,F,5\r\nEna,F,5\r\nEstie,F,5\r\nEtter,F,5\r\nFronnie,F,5\r\nGenie,F,5\r\nGeorgina,F,5\r\nGlenn,F,5\r\nGracia,F,5\r\nGuadalupe,F,5\r\nGwendolyn,F,5\r\nHassie,F,5\r\nHonora,F,5\r\nIcy,F,5\r\nIsa,F,5\r\nIsadora,F,5\r\nJesse,F,5\r\nJewel,F,5\r\nJoe,F,5\r\nJohannah,F,5\r\nJuana,F,5\r\nJudith,F,5\r\nJudy,F,5\r\nJunie,F,5\r\nLavonia,F,5\r\nLella,F,5\r\nLemma,F,5\r\nLetty,F,5\r\nLinna,F,5\r\nLittie,F,5\r\nLollie,F,5\r\nLorene,F,5\r\nLouis,F,5\r\nLove,F,5\r\nLovisa,F,5\r\nLucina,F,5\r\nLynn,F,5\r\nMadora,F,5\r\nMahalia,F,5\r\nManervia,F,5\r\nManuela,F,5\r\nMargarett,F,5\r\nMargaretta,F,5\r\nMargarita,F,5\r\nMarilla,F,5\r\nMignon,F,5\r\nMozella,F,5\r\nNatalie,F,5\r\nNelia,F,5\r\nNolie,F,5\r\nOmie,F,5\r\nOpal,F,5\r\nOssie,F,5\r\nOttie,F,5\r\nOttilia,F,5\r\nParthenia,F,5\r\nPenelope,F,5\r\nPinkey,F,5\r\nPollie,F,5\r\nRennie,F,5\r\nReta,F,5\r\nRoena,F,5\r\nRosalee,F,5\r\nRoseanna,F,5\r\nRuthie,F,5\r\nSabra,F,5\r\nSannie,F,5\r\nSelena,F,5\r\nSibyl,F,5\r\nTella,F,5\r\nTempie,F,5\r\nTennessee,F,5\r\nTeressa,F,5\r\nTexas,F,5\r\nTheda,F,5\r\nThelma,F,5\r\nThursa,F,5\r\nUla,F,5\r\nVannie,F,5\r\nVerona,F,5\r\nVertie,F,5\r\nWilma,F,5\r\nJohn,M,9655\r\nWilliam,M,9532\r\nJames,M,5927\r\nCharles,M,5348\r\nGeorge,M,5126\r\nFrank,M,3242\r\nJoseph,M,2632\r\nThomas,M,2534\r\nHenry,M,2444\r\nRobert,M,2415\r\nEdward,M,2364\r\nHarry,M,2152\r\nWalter,M,1755\r\nArthur,M,1599\r\nFred,M,1569\r\nAlbert,M,1493\r\nSamuel,M,1024\r\nDavid,M,869\r\nLouis,M,828\r\nJoe,M,731\r\nCharlie,M,730\r\nClarence,M,730\r\nRichard,M,728\r\nAndrew,M,644\r\nDaniel,M,643\r\nErnest,M,615\r\nWill,M,588\r\nJesse,M,569\r\nOscar,M,544\r\nLewis,M,517\r\nPeter,M,496\r\nBenjamin,M,490\r\nFrederick,M,483\r\nWillie,M,476\r\nAlfred,M,469\r\nSam,M,457\r\nRoy,M,440\r\nHerbert,M,424\r\nJacob,M,404\r\nTom,M,399\r\nElmer,M,373\r\nCarl,M,372\r\nLee,M,361\r\nHoward,M,357\r\nMartin,M,357\r\nMichael,M,354\r\nBert,M,348\r\nHerman,M,347\r\nJim,M,345\r\nFrancis,M,344\r\nHarvey,M,344\r\nEarl,M,335\r\nEugene,M,328\r\nRalph,M,317\r\nEd,M,310\r\nClaude,M,309\r\nEdwin,M,309\r\nBen,M,305\r\nCharley,M,305\r\nPaul,M,301\r\nEdgar,M,283\r\nIsaac,M,274\r\nOtto,M,271\r\nLuther,M,260\r\nLawrence,M,257\r\nIra,M,249\r\nPatrick,M,248\r\nGuy,M,239\r\nOliver,M,234\r\nTheodore,M,232\r\nHugh,M,224\r\nClyde,M,221\r\nAlexander,M,211\r\nAugust,M,210\r\nFloyd,M,206\r\nHomer,M,205\r\nJack,M,204\r\nLeonard,M,200\r\nHorace,M,199\r\nMarion,M,189\r\nPhilip,M,186\r\nAllen,M,184\r\nArchie,M,183\r\nStephen,M,176\r\nChester,M,168\r\nWillis,M,166\r\nRaymond,M,165\r\nRufus,M,163\r\nWarren,M,158\r\nJessie,M,154\r\nMilton,M,149\r\nAlex,M,147\r\nLeo,M,147\r\nJulius,M,143\r\nRay,M,142\r\nSidney,M,142\r\nBernard,M,140\r\nDan,M,140\r\nJerry,M,136\r\nCalvin,M,134\r\nPerry,M,134\r\nDave,M,131\r\nAnthony,M,130\r\nEddie,M,129\r\nAmos,M,128\r\nDennis,M,128\r\nClifford,M,127\r\nLeroy,M,124\r\nWesley,M,123\r\nAlonzo,M,122\r\nGarfield,M,122\r\nFranklin,M,120\r\nEmil,M,119\r\nLeon,M,118\r\nNathan,M,114\r\nHarold,M,113\r\nMatthew,M,113\r\nLevi,M,112\r\nMoses,M,111\r\nEverett,M,110\r\nLester,M,109\r\nWinfield,M,108\r\nAdam,M,104\r\nLloyd,M,104\r\nMack,M,104\r\nFredrick,M,103\r\nJay,M,103\r\nJess,M,103\r\nMelvin,M,103\r\nNoah,M,103\r\nAaron,M,102\r\nAlvin,M,102\r\nNorman,M,102\r\nGilbert,M,101\r\nElijah,M,100\r\nVictor,M,100\r\nGus,M,99\r\nNelson,M,99\r\nJasper,M,98\r\nSilas,M,98\r\nJake,M,96\r\nChristopher,M,95\r\nMike,M,95\r\nPercy,M,94\r\nAdolph,M,93\r\nMaurice,M,93\r\nCornelius,M,92\r\nFelix,M,92\r\nReuben,M,92\r\nWallace,M,91\r\nClaud,M,90\r\nRoscoe,M,90\r\nSylvester,M,89\r\nEarnest,M,88\r\nHiram,M,88\r\nOtis,M,88\r\nSimon,M,88\r\nWillard,M,88\r\nIrvin,M,86\r\nMark,M,85\r\nJose,M,84\r\nWilbur,M,82\r\nAbraham,M,81\r\nVirgil,M,81\r\nClinton,M,79\r\nElbert,M,79\r\nLeslie,M,79\r\nMarshall,M,78\r\nOwen,M,78\r\nWiley,M,78\r\nAnton,M,77\r\nMorris,M,77\r\nManuel,M,75\r\nPhillip,M,75\r\nAugustus,M,74\r\nEmmett,M,74\r\nEli,M,73\r\nNicholas,M,73\r\nWilson,M,72\r\nAlva,M,70\r\nHarley,M,70\r\nNewton,M,70\r\nTimothy,M,70\r\nMarvin,M,69\r\nRoss,M,69\r\nCurtis,M,68\r\nEdmund,M,67\r\nJeff,M,66\r\nElias,M,65\r\nHarrison,M,65\r\nStanley,M,65\r\nColumbus,M,64\r\nLon,M,64\r\nOra,M,64\r\nOllie,M,63\r\nPearl,M,62\r\nRussell,M,62\r\nSolomon,M,62\r\nArch,M,61\r\nAsa,M,60\r\nClayton,M,60\r\nEnoch,M,60\r\nIrving,M,60\r\nMathew,M,60\r\nNathaniel,M,60\r\nScott,M,60\r\nHubert,M,59\r\nLemuel,M,59\r\nAndy,M,58\r\nEllis,M,58\r\nEmanuel,M,57\r\nJoshua,M,57\r\nMillard,M,56\r\nVernon,M,56\r\nWade,M,56\r\nCyrus,M,54\r\nMiles,M,54\r\nRudolph,M,54\r\nSherman,M,54\r\nAustin,M,53\r\nBill,M,53\r\nChas,M,53\r\nLonnie,M,53\r\nMonroe,M,53\r\nByron,M,52\r\nEdd,M,52\r\nEmery,M,52\r\nGrant,M,52\r\nJerome,M,52\r\nMax,M,52\r\nMose,M,52\r\nSteve,M,52\r\nGordon,M,51\r\nAbe,M,50\r\nPete,M,50\r\nChris,M,49\r\nClark,M,49\r\nGustave,M,49\r\nOrville,M,49\r\nLorenzo,M,48\r\nBruce,M,47\r\nMarcus,M,47\r\nPreston,M,47\r\nBob,M,46\r\nDock,M,46\r\nDonald,M,46\r\nJackson,M,46\r\nCecil,M,45\r\nBarney,M,44\r\nDelbert,M,44\r\nEdmond,M,44\r\nAnderson,M,43\r\nChristian,M,43\r\nGlenn,M,43\r\nJefferson,M,43\r\nLuke,M,43\r\nNeal,M,43\r\nBurt,M,42\r\nIke,M,42\r\nMyron,M,42\r\nTony,M,42\r\nConrad,M,41\r\nJoel,M,41\r\nMatt,M,41\r\nRiley,M,41\r\nVincent,M,41\r\nEmory,M,40\r\nIsaiah,M,40\r\nNick,M,40\r\nEzra,M,39\r\nGreen,M,39\r\nJuan,M,39\r\nClifton,M,38\r\nLucius,M,38\r\nPorter,M,38\r\nArnold,M,37\r\nBud,M,37\r\nJeremiah,M,37\r\nTaylor,M,37\r\nForrest,M,36\r\nRoland,M,36\r\nSpencer,M,35\r\nBurton,M,34\r\nDon,M,34\r\nEmmet,M,34\r\nGustav,M,33\r\nLouie,M,33\r\nMorgan,M,33\r\nNed,M,33\r\nVan,M,33\r\nAmbrose,M,32\r\nChauncey,M,32\r\nElisha,M,32\r\nFerdinand,M,32\r\nGeneral,M,32\r\nJulian,M,32\r\nKenneth,M,32\r\nMitchell,M,32\r\nAllie,M,31\r\nJosh,M,31\r\nJudson,M,31\r\nLyman,M,31\r\nNapoleon,M,31\r\nPedro,M,31\r\nBerry,M,30\r\nDewitt,M,30\r\nErvin,M,30\r\nForest,M,30\r\nLynn,M,30\r\nPink,M,30\r\nRuben,M,30\r\nSanford,M,30\r\nWard,M,30\r\nDouglas,M,29\r\nOle,M,29\r\nOmer,M,29\r\nUlysses,M,29\r\nWalker,M,29\r\nWilbert,M,29\r\nAdelbert,M,28\r\nBenjiman,M,28\r\nIvan,M,28\r\nJonas,M,28\r\nMajor,M,28\r\nAbner,M,27\r\nArchibald,M,27\r\nCaleb,M,27\r\nClint,M,27\r\nDudley,M,27\r\nGranville,M,27\r\nKing,M,27\r\nMary,M,27\r\nMerton,M,27\r\nAntonio,M,26\r\nBennie,M,26\r\nCarroll,M,26\r\nFreeman,M,26\r\nJosiah,M,26\r\nMilo,M,26\r\nRoyal,M,26\r\nDick,M,25\r\nEarle,M,25\r\nElza,M,25\r\nEmerson,M,25\r\nFletcher,M,25\r\nJudge,M,25\r\nLaurence,M,25\r\nNeil,M,25\r\nRoger,M,25\r\nSeth,M,25\r\nGlen,M,24\r\nHugo,M,24\r\nJimmie,M,24\r\nJohnnie,M,24\r\nWashington,M,24\r\nElwood,M,23\r\nGust,M,23\r\nHarmon,M,23\r\nJordan,M,23\r\nSimeon,M,23\r\nWayne,M,23\r\nWilber,M,23\r\nClem,M,22\r\nEvan,M,22\r\nFrederic,M,22\r\nIrwin,M,22\r\nJunius,M,22\r\nLafayette,M,22\r\nLoren,M,22\r\nMadison,M,22\r\nMason,M,22\r\nOrval,M,22\r\nAbram,M,21\r\nAubrey,M,21\r\nElliott,M,21\r\nHans,M,21\r\nKarl,M,21\r\nMinor,M,21\r\nWash,M,21\r\nWilfred,M,21\r\nAllan,M,20\r\nAlphonse,M,20\r\nDallas,M,20\r\nDee,M,20\r\nIsiah,M,20\r\nJason,M,20\r\nJohnny,M,20\r\nLawson,M,20\r\nLew,M,20\r\nMicheal,M,20\r\nOrin,M,20\r\nAddison,M,19\r\nCal,M,19\r\nErastus,M,19\r\nFrancisco,M,19\r\nHardy,M,19\r\nLucien,M,19\r\nRandolph,M,19\r\nStewart,M,19\r\nVern,M,19\r\nWilmer,M,19\r\nZack,M,19\r\nAdrian,M,18\r\nAlvah,M,18\r\nBertram,M,18\r\nClay,M,18\r\nEphraim,M,18\r\nFritz,M,18\r\nGiles,M,18\r\nGrover,M,18\r\nHarris,M,18\r\nIsom,M,18\r\nJesus,M,18\r\nJohnie,M,18\r\nJonathan,M,18\r\nLucian,M,18\r\nMalcolm,M,18\r\nMerritt,M,18\r\nOtho,M,18\r\nPerley,M,18\r\nRolla,M,18\r\nSandy,M,18\r\nTomas,M,18\r\nWilford,M,18\r\nAdolphus,M,17\r\nAngus,M,17\r\nArther,M,17\r\nCarlos,M,17\r\nCary,M,17\r\nCassius,M,17\r\nDavis,M,17\r\nHamilton,M,17\r\nHarve,M,17\r\nIsrael,M,17\r\nLeander,M,17\r\nMelville,M,17\r\nMerle,M,17\r\nMurray,M,17\r\nPleasant,M,17\r\nSterling,M,17\r\nSteven,M,17\r\nAxel,M,16\r\nBoyd,M,16\r\nBryant,M,16\r\nClement,M,16\r\nErwin,M,16\r\nEzekiel,M,16\r\nFoster,M,16\r\nFrances,M,16\r\nGeo,M,16\r\nHouston,M,16\r\nIssac,M,16\r\nJules,M,16\r\nLarkin,M,16\r\nMat,M,16\r\nMorton,M,16\r\nOrlando,M,16\r\nPierce,M,16\r\nPrince,M,16\r\nRollie,M,16\r\nRollin,M,16\r\nSim,M,16\r\nStuart,M,16\r\nWilburn,M,16\r\nBennett,M,15\r\nCasper,M,15\r\nChrist,M,15\r\nDell,M,15\r\nEgbert,M,15\r\nElmo,M,15\r\nFay,M,15\r\nGabriel,M,15\r\nHector,M,15\r\nHoratio,M,15\r\nLige,M,15\r\nSaul,M,15\r\nSmith,M,15\r\nSquire,M,15\r\nTobe,M,15\r\nTommie,M,15\r\nWyatt,M,15\r\nAlford,M,14\r\nAlma,M,14\r\nAlton,M,14\r\nAndres,M,14\r\nBurl,M,14\r\nCicero,M,14\r\nDean,M,14\r\nDorsey,M,14\r\nEnos,M,14\r\nHowell,M,14\r\nLou,M,14\r\nLoyd,M,14\r\nMahlon,M,14\r\nNat,M,14\r\nOmar,M,14\r\nOran,M,14\r\nParker,M,14\r\nRaleigh,M,14\r\nReginald,M,14\r\nRubin,M,14\r\nSeymour,M,14\r\nWm,M,14\r\nYoung,M,14\r\nBenjamine,M,13\r\nCarey,M,13\r\nCarlton,M,13\r\nEldridge,M,13\r\nElzie,M,13\r\nGarrett,M,13\r\nIsham,M,13\r\nJohnson,M,13\r\nLarry,M,13\r\nLogan,M,13\r\nMerrill,M,13\r\nMont,M,13\r\nOren,M,13\r\nPierre,M,13\r\nRex,M,13\r\nRodney,M,13\r\nTed,M,13\r\nWebster,M,13\r\nWest,M,13\r\nWheeler,M,13\r\nWillam,M,13\r\nAl,M,12\r\nAloysius,M,12\r\nAlvie,M,12\r\nAnna,M,12\r\nArt,M,12\r\nAugustine,M,12\r\nBailey,M,12\r\nBenjaman,M,12\r\nBeverly,M,12\r\nBishop,M,12\r\nClair,M,12\r\nCloyd,M,12\r\nColeman,M,12\r\nDana,M,12\r\nDuncan,M,12\r\nDwight,M,12\r\nEmile,M,12\r\nEvert,M,12\r\nHenderson,M,12\r\nHunter,M,12\r\nJean,M,12\r\nLem,M,12\r\nLuis,M,12\r\nMathias,M,12\r\nMaynard,M,12\r\nMiguel,M,12\r\nMortimer,M,12\r\nNels,M,12\r\nNorris,M,12\r\nPat,M,12\r\nPhil,M,12\r\nRush,M,12\r\nSantiago,M,12\r\nSol,M,12\r\nSydney,M,12\r\nThaddeus,M,12\r\nThornton,M,12\r\nTim,M,12\r\nTravis,M,12\r\nTruman,M,12\r\nWatson,M,12\r\nWebb,M,12\r\nWellington,M,12\r\nWinfred,M,12\r\nWylie,M,12\r\nAlec,M,11\r\nBasil,M,11\r\nBaxter,M,11\r\nBertrand,M,11\r\nBuford,M,11\r\nBurr,M,11\r\nCleveland,M,11\r\nColonel,M,11\r\nDempsey,M,11\r\nEarly,M,11\r\nEllsworth,M,11\r\nFate,M,11\r\nFinley,M,11\r\nGabe,M,11\r\nGarland,M,11\r\nGerald,M,11\r\nHerschel,M,11\r\nHezekiah,M,11\r\nJustus,M,11\r\nLindsey,M,11\r\nMarcellus,M,11\r\nOlaf,M,11\r\nOlin,M,11\r\nPablo,M,11\r\nRolland,M,11\r\nTurner,M,11\r\nVerne,M,11\r\nVolney,M,11\r\nWilliams,M,11\r\nAlmon,M,10\r\nAlois,M,10\r\nAlonza,M,10\r\nAnson,M,10\r\nAuthur,M,10\r\nBenton,M,10\r\nBillie,M,10\r\nCornelious,M,10\r\nDarius,M,10\r\nDenis,M,10\r\nDillard,M,10\r\nDoctor,M,10\r\nElvin,M,10\r\nEmma,M,10\r\nEric,M,10\r\nEvans,M,10\r\nGideon,M,10\r\nHaywood,M,10\r\nHilliard,M,10\r\nHosea,M,10\r\nLincoln,M,10\r\nLonzo,M,10\r\nLucious,M,10\r\nLum,M,10\r\nMalachi,M,10\r\nNewt,M,10\r\nNoel,M,10\r\nOrie,M,10\r\nPalmer,M,10\r\nPinkney,M,10\r\nShirley,M,10\r\nSumner,M,10\r\nTerry,M,10\r\nUrban,M,10\r\nUriah,M,10\r\nValentine,M,10\r\nWaldo,M,10\r\nWarner,M,10\r\nWong,M,10\r\nZeb,M,10\r\nAbel,M,9\r\nAlden,M,9\r\nArcher,M,9\r\nAvery,M,9\r\nCarson,M,9\r\nCullen,M,9\r\nDoc,M,9\r\nEben,M,9\r\nElige,M,9\r\nElizabeth,M,9\r\nElmore,M,9\r\nErnst,M,9\r\nFinis,M,9\r\nFreddie,M,9\r\nGodfrey,M,9\r\nGuss,M,9\r\nHamp,M,9\r\nHermann,M,9\r\nIsadore,M,9\r\nIsreal,M,9\r\nJones,M,9\r\nJune,M,9\r\nLacy,M,9\r\nLafe,M,9\r\nLeland,M,9\r\nLlewellyn,M,9\r\nLudwig,M,9\r\nManford,M,9\r\nMaxwell,M,9\r\nMinnie,M,9\r\nObie,M,9\r\nOctave,M,9\r\nOrrin,M,9\r\nOssie,M,9\r\nOswald,M,9\r\nPark,M,9\r\nParley,M,9\r\nRamon,M,9\r\nRice,M,9\r\nStonewall,M,9\r\nTheo,M,9\r\nTillman,M,9\r\nAddie,M,8\r\nAron,M,8\r\nAshley,M,8\r\nBernhard,M,8\r\nBertie,M,8\r\nBerton,M,8\r\nBuster,M,8\r\nButler,M,8\r\nCarleton,M,8\r\nCarrie,M,8\r\nClara,M,8\r\nClarance,M,8\r\nClare,M,8\r\nCrawford,M,8\r\nDanial,M,8\r\nDayton,M,8\r\nDolphus,M,8\r\nElder,M,8\r\nEphriam,M,8\r\nFayette,M,8\r\nFelipe,M,8\r\nFernando,M,8\r\nFlem,M,8\r\nFlorence,M,8\r\nFord,M,8\r\nHarlan,M,8\r\nHayes,M,8\r\nHenery,M,8\r\nHoy,M,8\r\nHuston,M,8\r\nIda,M,8\r\nIvory,M,8\r\nJonah,M,8\r\nJustin,M,8\r\nLenard,M,8\r\nLeopold,M,8\r\nLionel,M,8\r\nManley,M,8\r\nMarquis,M,8\r\nMarshal,M,8\r\nMart,M,8\r\nOdie,M,8\r\nOlen,M,8\r\nOral,M,8\r\nOrley,M,8\r\nOtha,M,8\r\nPress,M,8\r\nPrice,M,8\r\nQuincy,M,8\r\nRandall,M,8\r\nRich,M,8\r\nRichmond,M,8\r\nRomeo,M,8\r\nRussel,M,8\r\nRutherford,M,8\r\nShade,M,8\r\nShelby,M,8\r\nSolon,M,8\r\nThurman,M,8\r\nTilden,M,8\r\nTroy,M,8\r\nWoodson,M,8\r\nWorth,M,8\r\nAden,M,7\r\nAlcide,M,7\r\nAlf,M,7\r\nAlgie,M,7\r\nArlie,M,7\r\nBart,M,7\r\nBedford,M,7\r\nBenito,M,7\r\nBilly,M,7\r\nBird,M,7\r\nBirt,M,7\r\nBruno,M,7\r\nBurley,M,7\r\nChancy,M,7\r\nClaus,M,7\r\nCliff,M,7\r\nClovis,M,7\r\nConnie,M,7\r\nCreed,M,7\r\nDelos,M,7\r\nDuke,M,7\r\nEber,M,7\r\nEligah,M,7\r\nElliot,M,7\r\nElton,M,7\r\nEmmitt,M,7\r\nGene,M,7\r\nGolden,M,7\r\nHal,M,7\r\nHardin,M,7\r\nHarman,M,7\r\nHervey,M,7\r\nHollis,M,7\r\nIvey,M,7\r\nJennie,M,7\r\nLen,M,7\r\nLindsay,M,7\r\nLonie,M,7\r\nLyle,M,7\r\nMac,M,7\r\nMal,M,7\r\nMath,M,7\r\nMiller,M,7\r\nOrson,M,7\r\nOsborne,M,7\r\nPercival,M,7\r\nPleas,M,7\r\nPles,M,7\r\nRafael,M,7\r\nRaoul,M,7\r\nRoderick,M,7\r\nRose,M,7\r\nShelton,M,7\r\nSid,M,7\r\nTheron,M,7\r\nTobias,M,7\r\nToney,M,7\r\nTyler,M,7\r\nVance,M,7\r\nVivian,M,7\r\nWalton,M,7\r\nWatt,M,7\r\nWeaver,M,7\r\nWilton,M,7\r\nAdolf,M,6\r\nAlbin,M,6\r\nAlbion,M,6\r\nAllison,M,6\r\nAlpha,M,6\r\nAlpheus,M,6\r\nAnastacio,M,6\r\nAndre,M,6\r\nAnnie,M,6\r\nArlington,M,6\r\nArmand,M,6\r\nAsberry,M,6\r\nAsbury,M,6\r\nAsher,M,6\r\nAugustin,M,6\r\nAuther,M,6\r\nAuthor,M,6\r\nBallard,M,6\r\nBlas,M,6\r\nCaesar,M,6\r\nCandido,M,6\r\nCato,M,6\r\nClarke,M,6\r\nClemente,M,6\r\nColin,M,6\r\nCommodore,M,6\r\nCora,M,6\r\nCoy,M,6\r\nCruz,M,6\r\nCurt,M,6\r\nDamon,M,6\r\nDavie,M,6\r\nDelmar,M,6\r\nDexter,M,6\r\nDora,M,6\r\nDoss,M,6\r\nDrew,M,6\r\nEdson,M,6\r\nElam,M,6\r\nElihu,M,6\r\nEliza,M,6\r\nElsie,M,6\r\nErie,M,6\r\nErnie,M,6\r\nEthel,M,6\r\nFerd,M,6\r\nFriend,M,6\r\nGarry,M,6\r\nGary,M,6\r\nGrace,M,6\r\nGustaf,M,6\r\nHallie,M,6\r\nHampton,M,6\r\nHarrie,M,6\r\nHattie,M,6\r\nHence,M,6\r\nHillard,M,6\r\nHollie,M,6\r\nHolmes,M,6\r\nHope,M,6\r\nHyman,M,6\r\nIshmael,M,6\r\nJarrett,M,6\r\nJessee,M,6\r\nJoeseph,M,6\r\nJunious,M,6\r\nKirk,M,6\r\nLevy,M,6\r\nMervin,M,6\r\nMichel,M,6\r\nMilford,M,6\r\nMitchel,M,6\r\nNellie,M,6\r\nNoble,M,6\r\nObed,M,6\r\nOda,M,6\r\nOrren,M,6\r\nOttis,M,6\r\nRafe,M,6\r\nRedden,M,6\r\nReese,M,6\r\nRube,M,6\r\nRuby,M,6\r\nRupert,M,6\r\nSalomon,M,6\r\nSammie,M,6\r\nSanders,M,6\r\nSoloman,M,6\r\nStacy,M,6\r\nStanford,M,6\r\nStanton,M,6\r\nThad,M,6\r\nTitus,M,6\r\nTracy,M,6\r\nVernie,M,6\r\nWendell,M,6\r\nWilhelm,M,6\r\nWillian,M,6\r\nYee,M,6\r\nZeke,M,6\r\nAb,M,5\r\nAbbott,M,5\r\nAgustus,M,5\r\nAlbertus,M,5\r\nAlmer,M,5\r\nAlphonso,M,5\r\nAlvia,M,5\r\nArtie,M,5\r\nArvid,M,5\r\nAshby,M,5\r\nAugusta,M,5\r\nAurthur,M,5\r\nBabe,M,5\r\nBaldwin,M,5\r\nBarnett,M,5\r\nBartholomew,M,5\r\nBarton,M,5\r\nBernie,M,5\r\nBlaine,M,5\r\nBoston,M,5\r\nBrad,M,5\r\nBradford,M,5\r\nBradley,M,5\r\nBrooks,M,5\r\nBuck,M,5\r\nBudd,M,5\r\nCeylon,M,5\r\nChalmers,M,5\r\nChesley,M,5\r\nChin,M,5\r\nCleo,M,5\r\nCrockett,M,5\r\nCyril,M,5\r\nDaisy,M,5\r\nDenver,M,5\r\nDow,M,5\r\nDuff,M,5\r\nEdie,M,5\r\nEdith,M,5\r\nElick,M,5\r\nElie,M,5\r\nEliga,M,5\r\nEliseo,M,5\r\nElroy,M,5\r\nEly,M,5\r\nEnnis,M,5\r\nEnrique,M,5\r\nErasmus,M,5\r\nEsau,M,5\r\nEverette,M,5\r\nFirman,M,5\r\nFleming,M,5\r\nFlora,M,5\r\nGardner,M,5\r\nGee,M,5\r\nGorge,M,5\r\nGottlieb,M,5\r\nGregorio,M,5\r\nGregory,M,5\r\nGustavus,M,5\r\nHalsey,M,5\r\nHandy,M,5\r\nHardie,M,5\r\nHarl,M,5\r\nHayden,M,5\r\nHays,M,5\r\nHermon,M,5\r\nHershel,M,5\r\nHolly,M,5\r\nHosteen,M,5\r\nHoyt,M,5\r\nHudson,M,5\r\nHuey,M,5\r\nHumphrey,M,5\r\nHunt,M,5\r\nHyrum,M,5\r\nIrven,M,5\r\nIsam,M,5\r\nIvy,M,5\r\nJabez,M,5\r\nJewel,M,5\r\nJodie,M,5\r\nJudd,M,5\r\nJulious,M,5\r\nJustice,M,5\r\nKatherine,M,5\r\nKelly,M,5\r\nKit,M,5\r\nKnute,M,5\r\nLavern,M,5\r\nLawyer,M,5\r\nLayton,M,5\r\nLeonidas,M,5\r\nLewie,M,5\r\nLillie,M,5\r\nLinwood,M,5\r\nLoran,M,5\r\nLorin,M,5\r\nMace,M,5\r\nMalcom,M,5\r\nManly,M,5\r\nManson,M,5\r\nMatthias,M,5\r\nMattie,M,5\r\nMerida,M,5\r\nMiner,M,5\r\nMontgomery,M,5\r\nMoroni,M,5\r\nMurdock,M,5\r\nMyrtle,M,5\r\nNate,M,5\r\nNathanial,M,5\r\nNimrod,M,5\r\nNora,M,5\r\nNorval,M,5\r\nNova,M,5\r\nOrion,M,5\r\nOrla,M,5\r\nOrrie,M,5\r\nPayton,M,5\r\nPhilo,M,5\r\nPhineas,M,5\r\nPresley,M,5\r\nRansom,M,5\r\nReece,M,5\r\nRene,M,5\r\nRoswell,M,5\r\nRowland,M,5\r\nSampson,M,5\r\nSamual,M,5\r\nSantos,M,5\r\nSchuyler,M,5\r\nSheppard,M,5\r\nSpurgeon,M,5\r\nStarling,M,5\r\nSylvanus,M,5\r\nTheadore,M,5\r\nTheophile,M,5\r\nTilmon,M,5\r\nTommy,M,5\r\nUnknown,M,5\r\nVann,M,5\r\nWes,M,5\r\nWinston,M,5\r\nWood,M,5\r\nWoodie,M,5\r\nWorthy,M,5\r\nWright,M,5\r\nYork,M,5\r\nZachariah,M,5\r\n')

其余初始化RDD的方法,包括:HDFS上的文件,Hive中的数据库与表,Spark SQL得到的结果。这里暂时不做介绍。

RDD Transformation

(1)RDDs可以进行一系列的变换得到新的RDD,有点类似列表推导式的操作,先给出一些RDD上最常用到的transformation:

map() 对RDD的每一个item都执行同一个操作
flatMap() 对RDD中的item执行同一个操作以后得到一个list,然后以平铺的方式把这些list里所有的结果组成新的list
filter() 筛选出来满足条件的item
distinct() 对RDD中的item去重
sample() 从RDD中的item中采样一部分出来,有放回或者无放回
sortBy() 对RDD中的item进行排序

如果想看操作后的结果,可以用一个叫做collect()的action把所有的item转成一个Python list。数据量大时,collect()很危险……

import pyspark
from pyspark import SparkContext as sc
from pyspark import SparkConf
conf=SparkConf().setAppName("miniProject").setMaster("local[*]")
sc=SparkContext.getOrCreate(conf)

numbersRDD = sc.parallelize(range(1,10+1))
print(numbersRDD.collect())

#map()对RDD的每一个item都执行同一个操作
squaresRDD = numbersRDD.map(lambda x: x**2)  # Square every number
print(squaresRDD.collect())

#filter()筛选出来满足条件的item
filteredRDD = numbersRDD.filter(lambda x: x % 2 == 0)  # Only the evens
print(filteredRDD.collect())

#Output:
#[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
#[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
#[2, 4, 6, 8, 10]
import pyspark
from pyspark import SparkContext as sc
from pyspark import SparkConf
conf=SparkConf().setAppName("miniProject").setMaster("local[*]")
sc=SparkContext.getOrCreate(conf)

#flatMap() 对RDD中的item执行同一个操作以后得到一个list,然后以平铺的方式把这些list里所有的结果组成新的list
sentencesRDD=sc.parallelize(['Hello world','My name is Patrick'])
wordsRDD=sentencesRDD.flatMap(lambda sentence: sentence.split(" "))
print(wordsRDD.collect())
print(wordsRDD.count())

#Output:
#['Hello', 'world', 'My', 'name', 'is', 'Patrick']
#6

对比一下:
这里如果使用map的结果是[[‘Hello’, ‘world’], [‘My’, ‘name’, ‘is’, ‘Patrick’]],
使用flatmap的结果是全部展开[‘Hello’, ‘world’, ‘My’, ‘name’, ‘is’, ‘Patrick’]。
flatmap即对应Python里的如下操作:

l = ['Hello world', 'My name is Patrick']
ll = []
for sentence in l:
    ll = ll + sentence.split(" ")  #+号作用,two list拼接
ll

(2)最开始列出的各个Transformation,可以一个接一个地串联使用,比如:

import pyspark
from pyspark import SparkContext as sc
from pyspark import SparkConf
conf=SparkConf().setAppName("miniProject").setMaster("local[*]")
sc=SparkContext.getOrCreate(conf)

def doubleIfOdd(x):
    if x % 2 == 1:
        return 2 * x
    else:
        return x
numbersRDD = sc.parallelize(range(1,10+1))
resultRDD = (numbersRDD          
             .map(doubleIfOdd)    #map,filter,distinct()
             .filter(lambda x: x > 6)
             .distinct())    #distinct()对RDD中的item去重
resultRDD.collect()

#Output:[8, 10, 18, 14]

(3)当遇到更复杂的结构,比如被称作“pair RDDs”的以元组形式组织的k-v对(key, value),Spark中针对这种item结构的数据,定义了一些transform和action:

reduceByKey(): 对所有有着相同key的items执行reduce操作
groupByKey(): 返回类似(key, listOfValues)元组的RDD,后面的value List 是同一个key下面的
sortByKey(): 按照key排序
countByKey(): 按照key去对item个数进行统计
collectAsMap(): 和collect有些类似,但是返回的是k-v的字典
import pyspark
from pyspark import SparkContext as sc
from pyspark import SparkConf
conf=SparkConf().setAppName("miniProject").setMaster("local[*]")
sc=SparkContext.getOrCreate(conf)

rdd=sc.parallelize(["Hello hello", "Hello New York", "York says hello"])
resultRDD=(rdd
    .flatMap(lambda sentence:sentence.split(" "))  
    .map(lambda word:word.lower())                 
    .map(lambda word:(word, 1))    #将word映射成(word,1)                  
    .reduceByKey(lambda x, y: x + y))  #reduceByKey对所有有着相同key的items执行reduce操作
resultRDD.collect()

#Output:[('hello', 4), ('york', 2), ('says', 1), ('new', 1)]

result = resultRDD.collectAsMap()  #collectAsMap类似collect,以k-v字典的形式返回
result

#Output:{'hello': 4, 'new': 1, 'says': 1, 'york': 2}

resultRDD.sortByKey(ascending=True).take(2)  #sortByKey按键排序

#Output:[('hello', 4), ('new', 1)]

#取出现频次最高的2个词
print(resultRDD
      .sortBy(lambda x: x[1], ascending=False)
      .take(2))

#Output:[('hello', 4), ('york', 2)]

RDD间的操作

(1)如果有2个RDD,可以通过下面这些操作,对它们进行集合运算得到1个新的RDD

rdd1.union(rdd2): 所有rdd1和rdd2中的item组合(并集)
rdd1.intersection(rdd2): rdd1 和 rdd2的交集
rdd1.substract(rdd2): 所有在rdd1中但不在rdd2中的item(差集)
rdd1.cartesian(rdd2): rdd1 和 rdd2中所有的元素笛卡尔乘积(正交和)
import pyspark
from pyspark import SparkContext as sc
from pyspark import SparkConf
conf=SparkConf().setAppName("miniProject").setMaster("local[*]")
sc=SparkContext.getOrCreate(conf)

#初始化两个RDD
numbersRDD = sc.parallelize([1,2,3])
moreNumbersRDD = sc.parallelize([2,3,4])
numbersRDD.union(moreNumbersRDD).collect()  #union()取并集

#Output:[1, 2, 3, 2, 3, 4]

numbersRDD.intersection(moreNumbersRDD).collect()  #intersection()取交集

#Output:[2, 3]

numbersRDD.subtract(moreNumbersRDD).collect()  #substract()取差集

#Output:[1]

numbersRDD.cartesian(moreNumbersRDD).collect()  #cartesian()取笛卡尔积

#Output:[(1, 2), (1, 3), (1, 4), (2, 2), (2, 3), (2, 4), (3, 2), (3, 3), (3, 4)]

(2)在给定2个RDD后,可以通过一个类似SQL的方式去join它们

import pyspark
from pyspark import SparkContext as sc
from pyspark import SparkConf
conf=SparkConf().setAppName("miniProject").setMaster("local[*]")
sc=SparkContext.getOrCreate(conf)

# Home of different people
homesRDD = sc.parallelize([
        ('Brussels', 'John'),
        ('Brussels', 'Jack'),
        ('Leuven', 'Jane'),
        ('Antwerp', 'Jill'),
    ])

# Quality of life index for various cities
lifeQualityRDD = sc.parallelize([
        ('Brussels', 10),
        ('Antwerp', 7),
        ('RestOfFlanders', 5),
    ])
homesRDD.join(lifeQualityRDD).collect()   #join

#Output:
#[('Antwerp', ('Jill', 7)),
# ('Brussels', ('John', 10)),
# ('Brussels', ('Jack', 10))]

homesRDD.leftOuterJoin(lifeQualityRDD).collect()   #leftOuterJoin

#Output:
#[('Antwerp', ('Jill', 7)),
# ('Leuven', ('Jane', None)),
# ('Brussels', ('John', 10)),
# ('Brussels', ('Jack', 10))]

homesRDD.rightOuterJoin(lifeQualityRDD).collect()   #rightOuterJoin

#Output:
#[('Antwerp', ('Jill', 7)),
# ('RestOfFlanders', (None, 5)),
# ('Brussels', ('John', 10)),
# ('Brussels', ('Jack', 10))]

homesRDD.cogroup(lifeQualityRDD).collect()   #cogroup

#Output:
#[('Antwerp',
#  (,
#   )),
# ('RestOfFlanders',
#  (,
#   )),
# ('Leuven',
#  (,
#   )),
# ('Brussels',
#  (,
#   ))]
# Oops!  Those s are Spark's way of returning a list
# that we can walk over, without materializing the list.
# Let's materialize the lists to make the above more readable:
(homesRDD
 .cogroup(lifeQualityRDD)
 .map(lambda x:(x[0], (list(x[1][0]), list(x[1][1]))))
 .collect())

#Output:
#[('Antwerp', (['Jill'], [7])),
# ('RestOfFlanders', ([], [5])),
# ('Leuven', (['Jane'], [])),
# ('Brussels', (['John', 'Jack'], [10]))]

惰性计算,actions方法

特别注意:Spark的一个核心概念是惰性计算。当你把一个RDD转换成另一个的时候,这个转换不会立即生效执行!!!Spark会把它先记在心里,等到真的有actions需要取转换结果时,才会重新组织transformations(因为可能有一连串的变换)。这样可以避免不必要的中间结果存储和通信。

常见的action如下,当它们出现的时候,表明需要执行上面定义过的transform了:

collect(): 计算所有的items并返回所有的结果到driver端,接着 collect()会以Python list的形式返回结果
first(): 和上面是类似的,不过只返回第1个item
take(n): 类似,但是返回n个item
count(): 计算RDD中item的个数
top(n): 返回头n个items,按照自然结果排序
reduce(): 对RDD中的items做聚合
import pyspark
from pyspark import SparkContext as sc
from pyspark import SparkConf
conf=SparkConf().setAppName("miniProject").setMaster("local[*]")
sc=SparkContext.getOrCreate(conf)

rdd = sc.parallelize(range(1,10+1))
rdd.reduce(lambda x, y: x + y)  #reduce(): 对RDD中的items做聚合

#Output:55

reduce的原理:先在每个分区(partition)里完成reduce操作,然后再全局地进行reduce。

有时候需要重复用到某个transform序列得到的RDD结果。但是一遍遍重复计算显然是要开销的,所以我们可以通过一个叫做cache()的操作把它暂时地存储在内存中。缓存RDD结果对于重复迭代的操作非常有用,比如很多机器学习的算法,训练过程需要重复迭代。

import pyspark
from pyspark import SparkContext as sc
from pyspark import SparkConf
conf=SparkConf().setAppName("miniProject").setMaster("local[*]")
sc=SparkContext.getOrCreate(conf)
import numpy as np
numbersRDD = sc.parallelize(np.linspace(1.0, 10.0, 10))
squaresRDD = numbersRDD.map(lambda x: x**2)

squaresRDD.cache()  # Preserve the actual items of this RDD in memory

avg = squaresRDD.reduce(lambda x, y: x + y) / squaresRDD.count()
print(avg)

#Output:38.5

你可能感兴趣的:(pyspark的使用和操作(基础整理))