Install NLTK
Linux:
- Install Numpy (optional):
sudo pip install -U numpy
- Install NLTK:
sudo pip install -U nltk
Windows:
- Install Numpy:
http://www.numpy.org/ - Install NLTK:
http://pypi.python.org/pypi/nltk
Executing NLTK
아래와 같이 파이썬 코드를 수행하면 에러가 발생한다. NLTK Data를 설치하지 않았기 때문이다.
파이썬 쉘에서 nltk.download() 명령어로 NLTK Data를 설치한다.
>>> import nltk >>> texts = nltk.word_tokenize("I am going to Seoul, Korea.") >>> print nltk.pos_tag(texts) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python2.7/dist-packages/nltk/tag/__init__.py", line 99, in pos_tag tagger = load(_POS_TAGGER) File "/usr/local/lib/python2.7/dist-packages/nltk/data.py", line 605, in load resource_val = pickle.load(_open(resource_url)) File "/usr/local/lib/python2.7/dist-packages/nltk/data.py", line 686, in _open return find(path).open() File "/usr/local/lib/python2.7/dist-packages/nltk/data.py", line 467, in find raise LookupError(resource_not_found) LookupError: ********************************************************************** Resource 'taggers/maxent_treebank_pos_tagger/english.pickle' not found. Please use the NLTK Downloader to obtain the resource: >>> nltk.download() Searched in: - '/home/changuk/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data' ********************************************************************** >>> nltk.download() NLTK Downloader --------------------------------------------------------------------------- d) Download l) List u) Update c) Config h) Help q) Quit --------------------------------------------------------------------------- Downloader> d (enter) Download which package (l=list; x=cancel)? Identifier> all (enter) Done downloading collection 'all' --------------------------------------------------------------------------- d) Download l) List c) Config h) Help q) Quit --------------------------------------------------------------------------- Downloader> q (enter) True >>> res = nltk.pos_tag(texts) >>> print(res) [('I', 'PRP'), ('am', 'VBP'), ('going', 'VBG'), ('to', 'TO'), ('Seoul', 'NNP'), (',', ','), ('Korea', 'NNP'), ('.', '.')] >>> type(res) <type 'list'> >>>
NLTK Data 설치가 완료되었고, 위에 작성했던 python code를 다시 수행하면 제대로 작동하는 것을 확인할 수 있다.
POS Tags
POS Tag | Description | Example |
---|---|---|
CC | coordinating conjunction | and |
CD | cardinal number | 1, third |
DT | determiner | the |
EX | existential there | there is |
FW | foreign word | d’hoevre |
IN | preposition/subordinating conjunction | in, of, like |
JJ | adjective | big |
JJR | adjective, comparative | bigger |
JJS | adjective, superlative | biggest |
LS | list marker | 1) |
MD | modal | could, will |
NN | noun, singular or mass | door |
NNS | noun plural | doors |
NNP | proper noun, singular | John |
NNPS | proper noun, plural | Vikings |
PDT | predeterminer | both the boys |
POS | possessive ending | friend‘s |
PRP | personal pronoun | I, he, it |
PRP$ | possessive pronoun | my, his |
RB | adverb | however, usually, naturally, here, good |
RBR | adverb, comparative | better |
RBS | adverb, superlative | best |
RP | particle | give up |
TO | to | to go, to him |
UH | interjection | uhhuhhuhh |
VB | verb, base form | take |
VBD | verb, past tense | took |
VBG | verb, gerund/present participle | taking |
VBN | verb, past participle | taken |
VBP | verb, sing. present, non-3d | take |
VBZ | verb, 3rd person sing. present | takes |
WDT | wh-determiner | which |
WP | wh-pronoun | who, what |
WP$ | possessive wh-pronoun | whose |
WRB | wh-abverb | where, when |