Install NLTK

Linux:

  1. Install Numpy (optional):
    sudo pip install -U numpy
  2. Install NLTK:
    sudo pip install -U nltk

Windows:

  1. Install Numpy:
    http://www.numpy.org/
  2. Install NLTK:
    http://pypi.python.org/pypi/nltk


Executing NLTK

아래와 같이 파이썬 코드를 수행하면 에러가 발생한다. NLTK Data를 설치하지 않았기 때문이다.

파이썬 쉘에서 nltk.download() 명령어로 NLTK Data를 설치한다.

>>> import nltk
>>> texts = nltk.word_tokenize("I am going to Seoul, Korea.")
>>> print nltk.pos_tag(texts)
    Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
          File "/usr/local/lib/python2.7/dist-packages/nltk/tag/__init__.py", line 99, in pos_tag
              tagger = load(_POS_TAGGER)
          File "/usr/local/lib/python2.7/dist-packages/nltk/data.py", line 605, in load
              resource_val = pickle.load(_open(resource_url))
          File "/usr/local/lib/python2.7/dist-packages/nltk/data.py", line 686, in _open
              return find(path).open()
          File "/usr/local/lib/python2.7/dist-packages/nltk/data.py", line 467, in find
              raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource 'taggers/maxent_treebank_pos_tagger/english.pickle' not
    found.  Please use the NLTK Downloader to obtain the resource:
    >>> nltk.download()
    Searched in:
      - '/home/changuk/nltk_data'
      - '/usr/share/nltk_data'
      - '/usr/local/share/nltk_data'
      - '/usr/lib/nltk_data'
      - '/usr/local/lib/nltk_data'
**********************************************************************
>>> nltk.download()
NLTK Downloader
---------------------------------------------------------------------------
    d) Download   l) List    u) Update   c) Config   h) Help   q) Quit
---------------------------------------------------------------------------
Downloader> d (enter)

Download which package (l=list; x=cancel)?
  Identifier> all (enter)
     Done downloading collection 'all'
---------------------------------------------------------------------------
    d) Download      l) List      c) Config      h) Help      q) Quit
---------------------------------------------------------------------------
Downloader> q (enter)
True
>>> res = nltk.pos_tag(texts)
>>> print(res)
[('I', 'PRP'), ('am', 'VBP'), ('going', 'VBG'), ('to', 'TO'), ('Seoul', 'NNP'),
(',', ','), ('Korea', 'NNP'), ('.', '.')]
>>> type(res)
<type 'list'>
>>>

NLTK Data 설치가 완료되었고, 위에 작성했던 python code를 다시 수행하면 제대로 작동하는 것을 확인할 수 있다.



POS Tags
POS Tag Description Example
CC coordinating conjunction and
CD cardinal number 1, third
DT determiner the
EX existential there there is
FW foreign word d’hoevre
IN preposition/subordinating conjunction in, of, like
JJ adjective big
JJR adjective, comparative bigger
JJS adjective, superlative biggest
LS list marker 1)
MD modal could, will
NN noun, singular or mass door
NNS noun plural doors
NNP proper noun, singular John
NNPS proper noun, plural Vikings
PDT predeterminer both the boys
POS possessive ending friend‘s
PRP personal pronoun I, he, it
PRP$ possessive pronoun my, his
RB adverb however, usually, naturally, here, good
RBR adverb, comparative better
RBS adverb, superlative best
RP particle give up
TO to to go, to him
UH interjection uhhuhhuhh
VB verb, base form take
VBD verb, past tense took
VBG verb, gerund/present participle taking
VBN verb, past participle taken
VBP verb, sing. present, non-3d take
VBZ verb, 3rd person sing. present takes
WDT wh-determiner which
WP wh-pronoun who, what
WP$ possessive wh-pronoun whose
WRB wh-abverb where, when



References
  1. http://www.nltk.org