開発環境
- macOS High Sierra - Apple
- Emacs (Text Editor)
- Python 3.6 (プログラミング言語)
入門 自然言語処理 (Steven Bird (著)、Ewan Klein (著)、Edward Loper (著)、萩原 正人 (翻訳)、中山 敬広 (翻訳)、水野 貴明 (翻訳)、オライリージャパン)の1章(言語処理とPython)、1.8(演習問題)7、8を取り組んでみる。
入出力結果(Terminal, Jupyter(IPython))
$ ipython Python 3.6.4 (default, Dec 21 2017, 20:33:21) Type 'copyright', 'credits' or 'license' for more information IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help. In [1]: from nltk.book import * *** Introductory Examples for the NLTK Book *** Loading text1, ..., text9 and sent1, ..., sent9 Type the name of the text or sentence to view it. Type: 'texts()' or 'sents()' to list the materials. text1: Moby Dick by Herman Melville 1851 text2: Sense and Sensibility by Jane Austen 1811 text3: The Book of Genesis text4: Inaugural Address Corpus text5: Chat Corpus text6: Monty Python and the Holy Grail text7: Wall Street Journal text8: Personals Corpus text9: The Man Who Was Thursday by G . K . Chesterton 1908 In [2]: text5.collocations() wanna chat; PART JOIN; MODE #14-19teens; JOIN PART; PART PART; cute.-ass MP3; MP3 player; JOIN JOIN; times .. .; ACTION watches; guys wanna; song lasts; last night; ACTION sits; -...)...- S.M.R.; Lime Player; Player 12%; dont know; lez gurls; long time In [3]: len(bigrms(text5)) --------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-3-b482f4408ef8> in <module>() ----> 1 len(bigrms(text5)) NameError: name 'bigrms' is not defined In [4]: bigrams --------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-4-c91f40429cac> in <module>() ----> 1 bigrams NameError: name 'bigrams' is not defined In [5]: from nltk import bigrams In [6]: len(bigrams(text5)) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-6-d9ecdddfa8ef> in <module>() ----> 1 len(bigrams(text5)) TypeError: object of type 'generator' has no len() In [7]: len(list(bigrams(text5))) Out[7]: 45009 In [8]: bigrams(text5)[:5] --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-8-5e112c95e39a> in <module>() ----> 1 bigrams(text5)[:5] TypeError: 'generator' object is not subscriptable In [9]: list(bigrams(text5))[:5] Out[9]: [('now', 'im'), ('im', 'left'), ('left', 'with'), ('with', 'this'), ('this', 'gay')] In [10]: set(text4) # text4 に含まれる単語の集合 Out[10]: {'dispose', 'schoolchildren', 'legitimately', 'learned', 'calmly', 'usury', 'organization', 'deprive', 'lightning', 'incalculable', 'avoided', 'definite', 'suggesting', 'protects', 'grim', 'instances', 'tariffs', 'window', 'accruing', 'extinction', 'performance', 'removable', 'FAILURE', 'lightening', 'preconceived', 'indignant', 'enjoins', 'Italy', 'matches', 'heated', 'shadow', 'old', 'prudence', 'risen', 'fairer', 'institutions', 'harm', 'loyally', 'promptness', 'temptation', 'tempt', 'humanize', 'send', 'resting', 'big', 'supplying', '¡¦', 'compose', 'perish', 'nursery', 'chairs', 'engraven', 'determination', 'Surely', 'national', 'endeavor', 'directing', 'antifederal', 'redress', 'economical', 'magnificent', 'From', 'fed', 'firmer', 'alteration', 'abuse', 'company', 'prevention', 'extend', 'raising', 'keeps', 'compliment', 'scales', 'shadows', 'AIDS', 'corrupted', 'abreast', 'declaration', 'prodigal', 'modem', 'resolves', 'oppressive', 'shudder', 'barter', 'moment', 'responsibilities', 'accept', 'majorities', 'Commerce', 'smoothly', 'motive', 'chargeable', 'discharged', 'lifted', 'aloof', 'sometimes', 'confronting', 'Xviolence', 'fine', '...', 'Amidst', 'constitutional', 'telegraph', 'fare', 'K', 'specify', 'prudent', 'dust', 'self', 'Budapest', 'imposing', 'disloyal', 'ushering', 'avoidance', 'surmount', 'ores', 'shelter', 'planting', 'Stars', 'disorders', 'sinister', 'recommendations', 'navigable', 'aggravation', 'concentrating', 'reputation', 'counter', 'Congressman', 'possession', 'college', 'judicious', 'measures', 'maritime', 'excellent', 'commonly', 'smuggled', 'rescind', 'decayed', 'buildup', 'frequency', 'interpreters', 'unreasonable', 'spoke', 'ants', 'between', 'clauses', 'proposition', 'thousands', 'respected', 'five', 'recovered', 'simmer', 'authors', 'vow', 'functionaries', 'victorious', 'sorrowful', 'fallacy', 'ends', 'Treasury', 'efficiently', 'delivered', 'avert', 'unhampered', 'CONGRESS', 'hopefulness', 'mark', 'possible', 'Has', 'responsibility', 'unbounded', 'victim', 'substantial', 'retarded', 'hardheartedness', 'qualification', 'Nebraska', 'degeneration', 'circulation', 'Yes', 'passes', 'rightfully', 'juncture', 'Rome', 'express', 'temperate', 'restrain', 'face', 'wherever', 'dishonor', 'heroes', 'deserted', 'stanch', 'stated', 'amidst', '2', 'seemed', 'Encountering', 'Magna', 'Senate', 'released', 'tensely', '8', 'child', 'contingency', 'quite', 'irrevocable', 'Eve', 'indulged', 'notification', 'attends', 'reaches', 'reversion', 'vindictive', 'vigor', 'excitement', 'possibilities', 'inconvenient', 'row', 'confers', 'inculcating', 'Julia', 'cleaner', 'outlays', 'intermission', '14th', 'replace', 'January', 'averted', 'Normandy', 'Monday', 'silk', 'console', 'moreover', 'judgment', 'uncounted', 'safe', 'enjoyed', 'intuitions', 'discriminate', 'rid', 'embittered', 'Forge', 'stretching', 'propriety', 'direct', '....', 'investigate', 'ponders', 'predicted', 'fight', 'transform', 'conflict', 'assigns', 'reasonably', 'inexcusable', 'particulars', 'reclamation', 'obtaining', 'fell', 'unifying', 'inconsistencies', 'annual', 'Egypt', 'deliberate', 'despaired', 'intentioned', 'respectfully', 'governmental', 'priorities', 'doing', '1', 'dress', 'Kindly', 'civility', 'Bush', 'probing', 'forthwith', 'scrutiny', 'abound', 'them', 'cosmos', 'excursions', '1817', 'franchise', 'obliteration', 'pitilessly', 'subjects', 'regulated', 'applicable', 'exhibited', 'articles', 'field', 'navies', 'precise', 'seized', 'Panama', 'hardier', 'hospitality', 'Territorial', 'amended', 'involvement', 'discouragement', 'unitedly', 'profit', 'tactic', 'revocation', 'imprudent', 'detachment', 'promptitude', 'casts', 'accumulated', 'whole', 'effort', 'Terrific', 'accumulation', 'badge', 'oftener', 'promoting', 'righteous', 'beneficence', 'struggled', 'blazed', 'pledged', 'served', 'Genius', 'unkept', 'rejecting', 'entitled', 'unleash', 'detect', 'sincerity', 'covenants', 'appeasement', 'ballot', 'enterprising', 'further', 'reform', 'Before', 'golden', 'negotiated', 'manufacturer', 'condemned', 'force', 'centers', 'neutrality', 'contraction', 'respite', 'intervening', 'emigrating', 'unhappy', 'breathing', 'beauty', 'sacrifices', 'deepening', 'rewards', 'wherein', 'This', 'patent', 'lighted', 'undue', 'legible', 'reflect', 'unfaithful', 'increased', 'fuel', 'benefited', 'spiritually', 'icy', 'warrant', 'heavens', 'erect', 'illumined', 'Orient', 'din', 'distinction', 'omitting', 'exacted', 'York', 'perfecting', 'settler', 'crushes', 'adopted', 'bestowal', 'acquired', '19th', 'briefly', 'usurper', 'friendly', 'healed', 'fearfully', 'pile', 'dreamed', 'evenly', 'inefficiently', 'report', 'lakes', 'artifice', 'interstate', 'spreading', 'succeed', 'makeup', 'seize', 'figures', 'achieve', 'promotions', 'delineated', 'concerted', 'early', 'They', 'concepts', 'devised', 'allows', 'Cincinnati', 'Persistent', 'occupying', 'participate', 'alienate', 'ability', 'mostly', 'frauds', 'impoverished', 'missiles', 'privileged', 'weapon', 'convinced', 'obstructed', 'session', 'expensive', 'canvass', 'Or', 'saying', 'travelled', 'midst', 'immigration', 'opening', 'ensign', 'depths', 'chords', 'reduce', 'None', 'agitated', 'chattel', 'mightiest', 'overrule', 'twilight', '4th', 'consequential', 'voluntarily', 'foes', 'Only', 'jurisprudence', 'defines', 'exchanges', 'Athens', 'rightly', 'dignified', 'cabbies', 'august', 'ax', 'blinded', 'added', 'victory', 'founding', 'text', 'committed', 'followed', 'fields', 'qualifications', 'instead', 'helping', 'pronounce', 'management', 'apprehension', 'except', 'night', 'neighbors', 'repeal', 'earlier', 'hour', 'gloomy', 'uncontrolled', 'uncomplaining', 'occasional', 'pool', 'weighty', 'administration', 'traces', 'illegal', 'staple', 'evacuation', 'bred', 'sheet', 'hoping', 'trappings', 'shrinking', 'maketh', 'variance', 'Comfort', 'pretensions', 'persistence', 'likewise', 'nuclear', 'knees', 'collected', 'summons', 'naturally', 'approached', 'constitution', 'maturing', 'infirmity', 'usages', 'availed', 'lifting', 'alter', 'weigh', 'hopes', 'arsenal', 'enlarging', 'assured', 'parties', 'here', 'discussion', 'Roman', 'dictatorship', 'thrown', 'resume', 'paces', 'transcending', 'warmth', 'undiminished', 'pre', 'Freedom', 'coast', 'Thy', 'basic', 'task', 'Putting', 'estranged', 'convenience', 'treasury', 'studying', 'Missouri', 'benefits', 'propagation', 'extraneous', 'coal', 'sparing', 'spasmodic', 'plunge', 'have', 'lawlessness', 'plans', 'convulsed', 'data', 'pleasures', 'ensued', 'durability', 'waited', 'Believing', 'calculation', 'Social', 'happiness', 't', 'entangling', 'Old', 'exploded', 'wonted', 'rush', 'diffidence', 'relationship', 'warfare', 'launched', 'foreigners', 'diseases', 'addresses', 'percentage', 'ignorant', 'inhabitant', 'governing', 'Commissioners', 'generosity', 'Thirty', '-', 'decoding', 'violated', 'radiance', 'forbearance', 'sages', 'aright', 'hadn', 'YOUNG', 'devices', 'Labor', 'prepare', 'steady', 'equals', 'disappeared', 'services', 'pursuance', 'Act', 'herself', 'industrialists', 'ingenuity', 'generation', 'draw', 'Mississippi', 'maximum', 'yearn', 'restoration', '1890', 'neck', 'Action', 'scrutinize', 'nameless', 'pleasantness', 'coercion', 'Conscious', 'care', 'house', 'immigrant', 'fiscally', 'unfulfilled', 'touchstone', 'overtake', 'inspection', 'nourishes', 'monopolies', 'affords', 'inescapably', 'when', 'message', 'attainment', 'gave', 'mutation', 'studies', 'inexorable', 'discrimination', 'territorial', 'roll', 'antiphilosophists', 'Indeed', 'operatives', 'Although', 'urging', 'fervently', 'stamping', 'hasten', 'sprang', 'maturity', 'stricken', 'unnecessary', 'uncharitableness', 'disunion', 'expense', 'Asia', 'ethnic', 'gentlemen', 'Cabinet', 'materially', 'purchasing', 'accrue', 'England', 'unbiased', 'positively', 'auspices', 'offensive', 'metallic', 'clarification', 'saved', 'intuitive', 'recital', 'wonders', 'transfer', 'retrenchment', 'marker', 'Florida', 'sanctioning', 'ancient', 'freedom', 'shaken', 'bastion', 'radical', 'Commons', 'prescription', 'spring', 'interfere', 'ably', 'checked', 'overruled', 'valued', 'selflessness', 'sympathize', 'boldest', 'majority', 'conference', 'Price', 'foreclosure', 'delusions', 'easily', 'dependable', 'described', 'remind', 'research', 'removing', 'subterfuge', 'sore', 'model', 'recognitions', 'circumstance', 'allies', 'morbid', 'insatiable', 'suffers', 'perception', 'enlargement', 'guardian', 'Pacific', 'baptism', 'clad', 'Fourth', 'subversion', 'Luther', 'artists', 'yielding', 'vessels', 'encroaches', 'wish', 'held', 'crises', 'Xthey', 'cutting', 'forums', 'remnant', 'disappointed', 'incoming', 'gratefully', 'shattered', 'waging', 'debts', 'color', 'ethics', 'specialized', 'amending', 'disappearing', 'Iowa', 'troubled', 'stead', 'disturbed', 'really', 'Texas', 'checking', 'averting', 'desires', 'newly', 'legislatures', 'expenditure', 'errant', 'Experiencing', 'bitterness', 'forces', 'dare', 'compress', 'unselfish', 'leadership', 'sovereignty', 'messages', 'warm', 'searching', 'unceasing', '4', 'he', 'roaming', 'kings', 'strife', 'felicity', 'cars', 'diamonds', 'things', 'according', '1917', 'incapable', 'Atlantic', 'prevailing', 'consul', 'act', 'advice', 'SYSTEM', 'snow', 'Middle', 'contending', 'possessing', 'lurks', 'navy', 'parents', 'rose', 'choices', 'mind', 'collisions', 'activity', 'retrospect', 'debasement', 'trust', 'delegation', 'depend', 'perfectly', 'deserts', 'finding', 'exultation', 'Subordinate', 'loveliness', 'refreshed', 'keeping', 'dying', 'appointees', 'Honoring', 'Everyone', 'wise', 'absurd', 'correspondent', 'incongruity', 'unrepealed', 'evils', 'perils', 'totalitarian', 'universe', 'pall', 'rising', 'executing', 'disgraceful', 'afloat', 'blind', 'Beach', 'lie', 'incautiously', 'rise', 'inaugurate', 'Amid', 'region', 'plainest', 'accommodations', 'Fathers', 'tentative', 'downfall', 'mental', 'fortifications', 'perfected', 'thirteenth', 'ratifications', 'ascertain', 'generate', 'religion', 'collective', 'abnormal', 'an', 'departure', 'frowning', 'imitation', 'ways', 'collapse', 'compromise', 'efforts', 'present', 'appearing', 'planted', 'reflecting', 'obvious', 'directly', 'lawless', 'defenseless', 'represent', 'throng', 'danger', 'bona', 'Considering', 'organizations', 'given', 'carries', 'Economic', 'ninth', 'republics', 'FROM', 'fundamental', 'hundred', 'unsettled', 'decide', 'Barbary', 'deferred', 'newspaper', 'animating', 'patriot', 'blessings', 'prompt', 'conform', 'awakened', 'financing', 'inexhaustible', 'mall', 'touch', 'Mr', 'superintend', 'Governments', 'repression', 'controlling', 'sets', 'Massachusetts', 'disparage', 'contend', 'heard', 'International', 'violate', 'remedy', 'plighted', 'Self', 'objections', 'hanging', 'surpassed', 'provision', 'imperfection', 'excrescence', 'consoling', 'Sermon', 'draining', 'inaction', 'unfurl', 'drift', 'considering', 'bodily', 'arbitrary', 'honored', 'permitting', 'Conceived', 'zealously', 'revealed', 'discriminating', 'know', 'Experience', '$', 'beseeching', 'challenged', 'suffrage', 'ardor', 'acquainted', 'line', 'Eventually', 'unremittingly', 'teacher', 'yields', 'sow', 'cost', 'finally', 'loftiest', 'withheld', 'mediocrity', 'luxuries', 'remark', 'Xand', 'Indulging', 'steps', 'recoiled', 'steamship', 'frustrated', 'actions', 'died', 'uttered', 'requisite', 'restlessness', 'Service', 'await', 'backs', 'propose', 'foremost', 'essential', 'whereof', 'Presidential', 'asked', 'clerk', 'missions', 'expounded', 'Unlike', 'financial', 'receipts', 'belonging', 'slogans', 'domestic', 'computation', 'requires', 'Christmas', 'strengths', 'watching', 'addressed', 'disturbances', 'Bell', 'idealistic', 'Relying', 'adore', 'across', 'swayed', 'tribute', 'admonish', ...} In [11]: len(_) # 単語数 Out[11]: 9754 In [12]: quit() $
0 コメント:
コメントを投稿