Install NLTK
Linux:
- Install Numpy (optional):1
sudopipinstall-U numpy - Install NLTK:1
sudopipinstall-U nltk
Windows:
- Install Numpy:
http://www.numpy.org/ - Install NLTK:
http://pypi.python.org/pypi/nltk
Executing NLTK
아래와 같이 파이썬 코드를 수행하면 에러가 발생한다. NLTK Data를 설치하지 않았기 때문이다.
파이썬 쉘에서 nltk.download() 명령어로 NLTK Data를 설치한다.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | >>> import nltk>>> texts = nltk.word_tokenize("I am going to Seoul, Korea.")>>> print nltk.pos_tag(texts) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python2.7/dist-packages/nltk/tag/__init__.py", line 99, in pos_tag tagger = load(_POS_TAGGER) File "/usr/local/lib/python2.7/dist-packages/nltk/data.py", line 605, in load resource_val = pickle.load(_open(resource_url)) File "/usr/local/lib/python2.7/dist-packages/nltk/data.py", line 686, in _open return find(path).open() File "/usr/local/lib/python2.7/dist-packages/nltk/data.py", line 467, in find raise LookupError(resource_not_found)LookupError: ********************************************************************** Resource 'taggers/maxent_treebank_pos_tagger/english.pickle' not found. Please use the NLTK Downloader to obtain the resource: >>> nltk.download() Searched in: - '/home/changuk/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data'**********************************************************************>>> nltk.download()NLTK Downloader--------------------------------------------------------------------------- d) Download l) List u) Update c) Config h) Help q) Quit---------------------------------------------------------------------------Downloader> d (enter)Download which package (l=list; x=cancel)? Identifier> all (enter) Done downloading collection 'all'--------------------------------------------------------------------------- d) Download l) List c) Config h) Help q) Quit---------------------------------------------------------------------------Downloader> q (enter)True>>> res = nltk.pos_tag(texts)>>> print(res)[('I', 'PRP'), ('am', 'VBP'), ('going', 'VBG'), ('to', 'TO'), ('Seoul', 'NNP'),(',', ','), ('Korea', 'NNP'), ('.', '.')]>>> type(res)<type 'list'>>>> |
NLTK Data 설치가 완료되었고, 위에 작성했던 python code를 다시 수행하면 제대로 작동하는 것을 확인할 수 있다.
POS Tags
| POS Tag | Description | Example |
|---|---|---|
| CC | coordinating conjunction | and |
| CD | cardinal number | 1, third |
| DT | determiner | the |
| EX | existential there | there is |
| FW | foreign word | d’hoevre |
| IN | preposition/subordinating conjunction | in, of, like |
| JJ | adjective | big |
| JJR | adjective, comparative | bigger |
| JJS | adjective, superlative | biggest |
| LS | list marker | 1) |
| MD | modal | could, will |
| NN | noun, singular or mass | door |
| NNS | noun plural | doors |
| NNP | proper noun, singular | John |
| NNPS | proper noun, plural | Vikings |
| PDT | predeterminer | both the boys |
| POS | possessive ending | friend‘s |
| PRP | personal pronoun | I, he, it |
| PRP$ | possessive pronoun | my, his |
| RB | adverb | however, usually, naturally, here, good |
| RBR | adverb, comparative | better |
| RBS | adverb, superlative | best |
| RP | particle | give up |
| TO | to | to go, to him |
| UH | interjection | uhhuhhuhh |
| VB | verb, base form | take |
| VBD | verb, past tense | took |
| VBG | verb, gerund/present participle | taking |
| VBN | verb, past participle | taken |
| VBP | verb, sing. present, non-3d | take |
| VBZ | verb, 3rd person sing. present | takes |
| WDT | wh-determiner | which |
| WP | wh-pronoun | who, what |
| WP$ | possessive wh-pronoun | whose |
| WRB | wh-abverb | where, when |