|
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
|
|
|
<html>
|
|
|
<head>
|
|
|
<title>pattern</title>
|
|
|
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
|
|
|
<link type="text/css" rel="stylesheet" href="../clips.css" />
|
|
|
<style>
|
|
|
/* Small fixes because we omit the online layout.css. */
|
|
|
h3 { line-height: 1.3em; }
|
|
|
#page { margin-left: auto; margin-right: auto; }
|
|
|
#header, #header-inner { height: 175px; }
|
|
|
#header { border-bottom: 1px solid #C6D4DD; }
|
|
|
table { border-collapse: collapse; }
|
|
|
#checksum { display: none; }
|
|
|
</style>
|
|
|
<link href="../js/shCore.css" rel="stylesheet" type="text/css" />
|
|
|
<link href="../js/shThemeDefault.css" rel="stylesheet" type="text/css" />
|
|
|
<script language="javascript" src="../js/shCore.js"></script>
|
|
|
<script language="javascript" src="../js/shBrushXml.js"></script>
|
|
|
<script language="javascript" src="../js/shBrushJScript.js"></script>
|
|
|
<script language="javascript" src="../js/shBrushPython.js"></script>
|
|
|
</head>
|
|
|
<body class="node-type-page one-sidebar sidebar-right section-pages">
|
|
|
<div id="page">
|
|
|
<div id="page-inner">
|
|
|
<div id="header"><div id="header-inner"></div></div>
|
|
|
<div id="content">
|
|
|
<div id="content-inner">
|
|
|
<div class="node node-type-page"
|
|
|
<div class="node-inner">
|
|
|
<div class="breadcrumb">View online at: <a href="http://www.clips.ua.ac.be/pages/pattern" class="noexternal" target="_blank">http://www.clips.ua.ac.be/pages/pattern</a></div>
|
|
|
<h1>pattern</h1>
|
|
|
<!-- Parsed from the online documentation. -->
|
|
|
<div id="node-1350" class="node node-type-page"><div class="node-inner">
|
|
|
<div class="content">
|
|
|
<p><span class="big">Pattern is a web mining module for the Python programming language.</span></p>
|
|
|
<p><span class="big">It has tools for data mining (Google, Twitter and Wikipedia API, a web crawler, a HTML DOM parser), natural language processing (part-of-speech taggers, n-gram search, sentiment analysis, WordNet), machine learning (vector space model, clustering, SVM), network analysis and <canvas> visualization.</span></p>
|
|
|
<p>The module is free, well-document and bundled with 50+ examples and 350+ unit tests.</p>
|
|
|
<p><img src="../g/pattern_schema.gif" alt="" width="620" height="180" /></p>
|
|
|
<hr />
|
|
|
<h2>Download</h2>
|
|
|
<table>
|
|
|
<tbody>
|
|
|
<tr>
|
|
|
<td><a onclick="javascript:_gaq.push(['_trackPageview', '/downloads/pattern']);" href="http://www.clips.ua.ac.be/media/pattern-2.6.zip" target="_self"><img src="../g/download.gif" alt="download" align="left" /></a></td>
|
|
|
<td><strong>Pattern 2.6</strong> | <a onclick="javascript:_gaq.push(['_trackPageview', '/downloads/pattern']);" href="http://www.clips.ua.ac.be/media/pattern-2.6.zip" target="_self">download</a> (.zip, 23MB)<br />
|
|
|
<ul>
|
|
|
<li>Requires: Python 2.5+ on Windows | Mac | Linux</li>
|
|
|
<li>Licensed under <a href="http://www.linfo.org/bsdlicense.html" target="_blank">BSD</a></li>
|
|
|
<li>Latest releases: <a class="noexternal" href="http://www.clips.ua.ac.be/media/pattern-2.6.zip">2.6</a> | <a class="noexternal" href="http://www.clips.ua.ac.be/media/pattern-2.5.zip">2.5</a> | <a class="noexternal" href="http://www.clips.ua.ac.be/media/pattern-2.4.zip">2.4</a> | <a class="noexternal" href="http://www.clips.ua.ac.be/media/pattern-2.3.zip">2.3</a> | <a class="noexternal" href="http://www.clips.ua.ac.be/media/pattern-2.2.zip">2.2</a> | <a class="noexternal" href="http://www.clips.ua.ac.be/media/pattern-2.1.zip">2.1</a> | <a class="noexternal" href="http://www.clips.ua.ac.be/media/pattern-2.0.zip">2.0</a></li>
|
|
|
<li>Authors:<br /> Tom De Smedt (<em>tom at organisms.be</em>)<br /> Walter Daelemans </li>
|
|
|
</ul>
|
|
|
<p><span class="small"><span style="text-decoration: underline;">Reference</span>: De Smedt, T. & Daelemans, W. (2012)</span>.<br /><span class="small">Pattern for Python. <em>Journal of Machine Learning Research</em>, 13: 2031–2035.</span></p>
|
|
|
<p id="checksum" class="grey"><span class="small"><span style="text-decoration: underline;">SHA256</span> checksum of the .zip:<br />28213f05d94a86d2de1d8a03525d456a9e68dc3b563dc2481ad08fe3db180d02</span></p>
|
|
|
</td>
|
|
|
<td>
|
|
|
</td>
|
|
|
</tr>
|
|
|
</tbody>
|
|
|
</table>
|
|
|
<p> </p>
|
|
|
<hr />
|
|
|
<table border="0">
|
|
|
<tbody>
|
|
|
<tr>
|
|
|
<td style="width: 200px;">
|
|
|
<h2>Modules</h2>
|
|
|
<ul>
|
|
|
<li><a href="pattern-web.html">pattern.web</a></li>
|
|
|
<li><a href="pattern-db.html">pattern.db</a></li>
|
|
|
<li><a href="pattern-en.html">pattern.en</a> | <a href="pattern-es.html">es</a> | <a href="pattern-de.html">de</a> | <a href="pattern-fr.html">fr</a> | <a href="pattern-it.html">it</a> | <a href="pattern-nl.html">nl</a></li>
|
|
|
<li><a href="pattern-search.html">pattern.search</a></li>
|
|
|
<li><a href="pattern-vector.html">pattern.vector</a></li>
|
|
|
<li><a href="pattern-graph.html">pattern.graph</a> </li>
|
|
|
</ul>
|
|
|
<p><span class="smallcaps">Helper modules</span></p>
|
|
|
<ul style="margin-top: 0;">
|
|
|
<li><a href="pattern-metrics.html">pattern.metrics</a></li>
|
|
|
<li><a href="pattern-canvas.html">canvas.js</a></li>
|
|
|
</ul>
|
|
|
<p><span class="smallcaps">Command-line</span></p>
|
|
|
<ul style="margin-top: 0;">
|
|
|
<li><a href="pattern-shell.html">Command-line interface</a></li>
|
|
|
</ul>
|
|
|
</td>
|
|
|
<td>
|
|
|
<h2><a name="contribute"></a>Contribute</h2>
|
|
|
<ul>
|
|
|
<li><a href="pattern-dev.html">Developer documentation</a></li>
|
|
|
<li><a href="https://github.com/clips/pattern" target="_blank">GitHub repository</a></li>
|
|
|
<li><a href="http://groups.google.com/group/pattern-for-python" target="_blank">Google group</a></li>
|
|
|
</ul>
|
|
|
<form action="https://www.paypal.com/cgi-bin/webscr" method="post"><input type="hidden" name="cmd" value="_s-xclick" /> <input type="hidden" name="hosted_button_id" value="HW2GU5PNWYQV8" /> <input type="image" name="submit" src="../g/paypal-donate.jpg" alt="PayPal - The safer, easier way to pay online!" /> <img src="https://www.paypalobjects.com/en_US/i/scr/pixel.gif" alt="" width="1" height="1" border="0" /></form>
|
|
|
</td>
|
|
|
</tr>
|
|
|
</tbody>
|
|
|
</table>
|
|
|
<p> </p>
|
|
|
<hr />
|
|
|
<h2>Installation</h2>
|
|
|
<p>Pattern is written for Python 2.5+ (also supports Python 3.6+). The module has no external dependencies, except <span class="inline_code">LSA</span> in the pattern.vector module, which requires <a href="http://numpy.scipy.org/" target="_blank">NumPy</a> (installed by default on Mac OS X). </p>
|
|
|
<p>To install Pattern so that the module is available in all Python scripts, from the command line do:</p>
|
|
|
<div class="install">
|
|
|
<pre class="gutter:false; light:true;">> cd pattern-3.6
|
|
|
> python setup.py install </pre></div>
|
|
|
<p>If you have pip, you can automatically download and install from the PyPi repository:</p>
|
|
|
<div class="install">
|
|
|
<pre class="gutter:false; light:true;">> pip install pattern</pre></div>
|
|
|
<p>If none of the above works, you can make Python aware of the module in three ways:</p>
|
|
|
<ul>
|
|
|
<li>Put the <span class="inline_code">pattern</span> subfolder in the .zip archive in the same folder as your script.</li>
|
|
|
<li>Put the <span class="inline_code">pattern</span> subfolder in the standard location for modules so it is available to all scripts:<br /><span class="inline_code">c:\python27\Lib\site-packages\</span> (Windows),<br /><span class="inline_code"> /Library/Python/2.7/site-packages/</span> (Mac),
<br /><span class="inline_code">/usr/lib/python2.7/site-packages/</span> (Unix).<span style="font-family: Courier, monospace; font-size: small;"><span style="font-size: 12px;"><br /></span></span></li>
|
|
|
<li>Add the location of the module to <span class="inline_code">sys.path</span> in your Python script, before importing it:</li>
|
|
|
</ul>
|
|
|
<div class="example">
|
|
|
<pre class="brush:python; gutter:false; light:true;">>>> import sys; sys.path.append('/users/tom/desktop/pattern')
|
|
|
>>> from pattern.web import Twitter </pre></div>
|
|
|
<p> </p>
|
|
|
<hr />
|
|
|
<h2>Quick overview</h2>
|
|
|
<h3>pattern.web</h3>
|
|
|
<p>The <a href="pattern-web.html">pattern.web</a> module is a web toolkit that contains API's (Google, Gmail, Bing, Twitter, Facebook, Wikipedia, Wiktionary, DBPedia, Flickr, ...), a robust HTML DOM parser and a web crawler.</p>
|
|
|
<div class="example">
|
|
|
<pre class="brush:python; gutter:false; light:true;">>>> from pattern.web import Twitter, plaintext
|
|
|
>>>
|
|
|
>>> twitter = Twitter(language='en')
|
|
|
>>> for tweet in twitter.search('"more important than"', cached=False):
|
|
|
>>> print plaintext(tweet.text)
|
|
|
|
|
|
'The mobile web is more important than mobile apps.'
|
|
|
'Start slowly, direction is more important than speed.'
|
|
|
'Imagination is more important than knowledge. - Albert Einstein'
|
|
|
... </pre></div>
|
|
|
<h3>pattern.en</h3>
|
|
|
<p>The <a href="pattern-en.html">pattern.en</a> module is a natural language processing (NLP) toolkit for English. Because language is ambiguous (e.g., <em>I can</em> ↔ <em>a can</em>) it uses statistical approaches + regular expressions. This means that it is fast, quite accurate and occasionally incorrect. It has a part-of-speech tagger that identifies word types (e.g., noun, verb, adjective), word inflection (conjugation, singularization) and a WordNet API.</p>
|
|
|
<div class="example">
|
|
|
<pre class="brush:python; gutter:false; light:true;">>>> from pattern.en import parse
|
|
|
>>>
|
|
|
>>> s = 'The mobile web is more important than mobile apps.'
|
|
|
>>> s = parse(s, relations=True, lemmata=True)
|
|
|
>>> print s
|
|
|
|
|
|
'The/DT/B-NP/O/NP-SBJ-1/the mobile/JJ/I-NP/O/NP-SBJ-1/mobile' ...
|
|
|
</pre></div>
|
|
|
<table class="border">
|
|
|
<tbody>
|
|
|
<tr>
|
|
|
<td class="smallcaps" style="text-align: right;">word</td>
|
|
|
<td class="smallcaps" style="text-align: center;">tag</td>
|
|
|
<td class="smallcaps" style="text-align: center;">chunk</td>
|
|
|
<td class="smallcaps" style="text-align: center;">role</td>
|
|
|
<td class="smallcaps" style="text-align: center;">id</td>
|
|
|
<td class="smallcaps" style="text-align: center;">pnp</td>
|
|
|
<td class="smallcaps">lemma</td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td style="text-align: right;">The</td>
|
|
|
<td class="inline_code" style="text-align: center;">DT</td>
|
|
|
<td class="inline_code" style="text-align: center;">NP </td>
|
|
|
<td class="inline_code" style="text-align: center;">SBJ</td>
|
|
|
<td class="inline_code" style="text-align: center;">1</td>
|
|
|
<td class="inline_code" style="text-align: center;">-</td>
|
|
|
<td><em>the</em></td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td style="text-align: right;">mobile</td>
|
|
|
<td class="inline_code" style="text-align: center;">JJ</td>
|
|
|
<td class="inline_code" style="text-align: center;">NP^</td>
|
|
|
<td class="inline_code" style="text-align: center;">SBJ</td>
|
|
|
<td class="inline_code" style="text-align: center;">1</td>
|
|
|
<td class="inline_code" style="text-align: center;">-</td>
|
|
|
<td><em>mobile</em></td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td style="text-align: right;">web</td>
|
|
|
<td class="inline_code" style="text-align: center;">NN</td>
|
|
|
<td class="inline_code" style="text-align: center;">NP^</td>
|
|
|
<td class="inline_code" style="text-align: center;">SBJ</td>
|
|
|
<td class="inline_code" style="text-align: center;">1</td>
|
|
|
<td class="inline_code" style="text-align: center;">-</td>
|
|
|
<td><em>web</em></td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td style="text-align: right;">is</td>
|
|
|
<td class="inline_code" style="text-align: center;">VBZ</td>
|
|
|
<td class="inline_code" style="text-align: center;">VP </td>
|
|
|
<td class="inline_code" style="text-align: center;">-</td>
|
|
|
<td class="inline_code" style="text-align: center;">1</td>
|
|
|
<td class="inline_code" style="text-align: center;">-</td>
|
|
|
<td><em>be</em></td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td style="text-align: right;">more</td>
|
|
|
<td class="inline_code" style="text-align: center;">RBR</td>
|
|
|
<td class="inline_code" style="text-align: center;">ADJP </td>
|
|
|
<td class="inline_code" style="text-align: center;">-</td>
|
|
|
<td class="inline_code" style="text-align: center;">-</td>
|
|
|
<td class="inline_code" style="text-align: center;">-</td>
|
|
|
<td><em>more</em></td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td style="text-align: right;">important</td>
|
|
|
<td class="inline_code" style="text-align: center;">JJ</td>
|
|
|
<td class="inline_code" style="text-align: center;">ADJP^</td>
|
|
|
<td class="inline_code" style="text-align: center;">-</td>
|
|
|
<td class="inline_code" style="text-align: center;">-</td>
|
|
|
<td class="inline_code" style="text-align: center;">-</td>
|
|
|
<td><em>important</em></td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td style="text-align: right;">than</td>
|
|
|
<td class="inline_code" style="text-align: center;">IN</td>
|
|
|
<td class="inline_code" style="text-align: center;">PP </td>
|
|
|
<td class="inline_code" style="text-align: center;">-</td>
|
|
|
<td class="inline_code" style="text-align: center;">-</td>
|
|
|
<td class="inline_code" style="text-align: center;">PNP</td>
|
|
|
<td><em>than</em></td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td style="text-align: right;">mobile</td>
|
|
|
<td class="inline_code" style="text-align: center;">JJ</td>
|
|
|
<td class="inline_code" style="text-align: center;">NP </td>
|
|
|
<td class="inline_code" style="text-align: center;">-</td>
|
|
|
<td class="inline_code" style="text-align: center;">-</td>
|
|
|
<td class="inline_code" style="text-align: center;">PNP</td>
|
|
|
<td><em>mobile</em></td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td style="text-align: right;">apps</td>
|
|
|
<td class="inline_code" style="text-align: center;">NNS</td>
|
|
|
<td class="inline_code" style="text-align: center;">NP^</td>
|
|
|
<td class="inline_code" style="text-align: center;">-</td>
|
|
|
<td class="inline_code" style="text-align: center;">-</td>
|
|
|
<td class="inline_code" style="text-align: center;">PNP</td>
|
|
|
<td><em>app</em></td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td style="text-align: right;">.</td>
|
|
|
<td class="inline_code" style="text-align: center;">.</td>
|
|
|
<td class="inline_code" style="text-align: center;">- </td>
|
|
|
<td class="inline_code" style="text-align: center;">-</td>
|
|
|
<td class="inline_code" style="text-align: center;">-</td>
|
|
|
<td class="inline_code" style="text-align: center;">-</td>
|
|
|
<td>.</td>
|
|
|
</tr>
|
|
|
</tbody>
|
|
|
</table>
|
|
|
<p>The text has been annotated with word types, for example nouns (<span class="postag">NN</span>), verbs(<span class="postag">VB</span>), adjectives (<span class="postag">JJ</span>) and determiners (<span class="postag">DT</span>), word types (e.g., sentence subject <span class="postag">SBJ</span>) and prepositional noun phrases (<span class="postag">PNP</span>). To iterate over the parts in the tagged text we can construct a <em>parse tree</em>.</p>
|
|
|
<div class="example">
|
|
|
<pre class="brush:python; gutter:false; light:true;">>>> from pattern.en import parsetree
|
|
|
>>>
|
|
|
>>> s = 'The mobile web is more important than mobile apps.'
|
|
|
>>> s = parsetree(s)
|
|
|
>>> for sentence in s:
|
|
|
>>> for chunk in sentence.chunks:
|
|
|
>>> for word in chunk.words:
|
|
|
>>> print word,
|
|
|
>>> print
|
|
|
|
|
|
Word(u'The/DT') Word(u'mobile/JJ') Word(u'web/NN')
|
|
|
Word(u'is/VBZ')
|
|
|
Word(u'more/RBR') Word(u'important/JJ')
|
|
|
Word(u'than/IN')
|
|
|
Word(u'mobile/JJ') Word(u'apps/NNS')
|
|
|
</pre></div>
|
|
|
<p>Parsers for Spanish, French, Italian, German and Dutch are also available: <br /><a href="pattern-es.html">pattern.es</a> | <a href="pattern-fr.html">pattern.fr</a> | <a href="pattern-it.html">pattern.it</a> | <a href="pattern-de.html">pattern.de</a> | <a href="pattern-nl.html">pattern.nl</a></p>
|
|
|
<h3>pattern.search</h3>
|
|
|
<p>The <a href="pattern-search.html">pattern.search</a> module contains a search algorithm to retrieve sequences of words (called <em>n-grams</em>) from tagged text.</p>
|
|
|
<div class="example">
|
|
|
<pre class="brush:python; gutter:false; light:true;">>>> from pattern.en import parsetree
|
|
|
>>> from pattern.search import search
|
|
|
>>>
|
|
|
>>> s = 'The mobile web is more important than mobile apps.'
|
|
|
>>> s = parsetree(s, relations=True, lemmata=True)
|
|
|
>>>
|
|
|
>>> for match in search('NP be RB?+ important than NP', s):
|
|
|
>>> print match.constituents()[-1], '=>', \
|
|
|
>>> match.constituents()[0]
|
|
|
|
|
|
Chunk('mobile apps/NP') => Chunk('The mobile web/NP-SBJ-1')
|
|
|
</pre></div>
|
|
|
<p>The search pattern <span class="inline_code">NP</span> <span class="inline_code">be</span> <span class="inline_code">RB?+</span> <span class="inline_code">important</span> <span class="inline_code">than</span> <span class="inline_code">NP</span> means any noun phrase (<span class="postag">NP</span>) followed by the verb <em>to be</em>, followed by zero or more adverbs (<span class="postag">RB</span>, e.g., <em>much</em>, <em>more</em>), followed by the words <em>important than</em>, followed by any noun phrase. It will also match "<em>The mobile web <span style="text-decoration: underline;">will</span> <span style="text-decoration: underline;">be</span> <span style="text-decoration: underline;">much</span> <span style="text-decoration: underline;">less</span> important than mobile apps</em>" and other grammatical variations.</p>
|
|
|
<h3>pattern.vector</h3>
|
|
|
<p>The <a href="pattern-vector.html">pattern.vector</a> module is a toolkit for machine learning, based on a vector space model of bag-of-words documents with weighted features (e.g., tf-idf) and distance metrics (e.g., cosine similarity, infogain). Models can be used for clustering (<em>k</em>-means, hierarchical), classification (Naive Bayes, Perceptron, <em>k-</em>NN, SVM) and latent semantic analysis (LSA).</p>
|
|
|
<div>
|
|
|
<div class="example">
|
|
|
<pre class="brush: python;gutter: false; fontsize: 100; first-line: 1; ">>>> from pattern.web import Twitter
|
|
|
>>> from pattern.en import tag
|
|
|
>>> from pattern.vector import KNN, count
|
|
|
>>>
|
|
|
>>> twitter, knn = Twitter(), KNN()
|
|
|
>>>
|
|
|
>>> for i in range(1, 10):
|
|
|
>>> for tweet in twitter.search('#win OR #fail', start=i, count=100):
|
|
|
>>> s = tweet.text.lower()
|
|
|
>>> p = '#win' in s and 'WIN' or 'FAIL'
|
|
|
>>> v = tag(s)
|
|
|
>>> v = [word for word, pos in v if pos == 'JJ'] # JJ = adjective
|
|
|
>>> v = count(v)
|
|
|
>>> if v:
|
|
|
>>> knn.train(v, type=p)
|
|
|
>>>
|
|
|
>>> print knn.classify('sweet potato burger')
|
|
|
>>> print knn.classify('stupid autocorrect')
|
|
|
|
|
|
'WIN'
|
|
|
'FAIL' </pre></div>
|
|
|
</div>
|
|
|
<p>This example trains a classifier on adjectives mined from Twitter. First, tweets with hashtag #win or #fail are mined. For example: <em>"$20 tip off a <span style="text-decoration: underline;">sweet</span> <span style="text-decoration: underline;">little</span> <span style="text-decoration: underline;">old</span> lady today #win"</em>. The word part-of-speech tags are parsed, keeping only adjectives. Each tweet is transformed to a vector, a dictionary of adjective → count items, labeled <span class="inline_code">WIN</span> or <span class="inline_code">FAIL</span>. The classifier uses the vectors to learn which other, unknown tweets look more like <span class="inline_code">WIN</span> (e.g., <em>sweet potato burger</em>) or more like <span class="inline_code">FAIL</span> (e.g., <em>stupid autocorrect</em>).</p>
|
|
|
<h3>pattern.graph</h3>
|
|
|
<p>The <a href="pattern-graph.html">pattern.graph</a> module provides a graph data structure that represents relations between nodes (e.g., terms, concepts). Graphs can be exported as HTML <span class="inline_code"><canvas></span> animations (<span class="link-maintenance"><a href="http://www.clips.ua.ac.be/media/pattern-graph" target="_blank">demo</a></span>). In the example below, more <em>central</em> nodes (= more incoming traffic) are colored in blue.</p>
|
|
|
<p><img class="border" src="../g/pattern_graph5.jpg" alt="" width="610" height="198" /></p>
|
|
|
<div class="example">
|
|
|
<pre class="brush:python; gutter:false; light:true;">>>> from pattern.web import Bing, plaintext
|
|
|
>>> from pattern.en import parsetree
|
|
|
>>> from pattern.search import search
|
|
|
>>> from pattern.graph import Graph
|
|
|
>>>
|
|
|
>>> g = Graph()
|
|
|
>>> for i in range(10):
|
|
|
>>> for result in Bing().search('"more important than"', start=i+1, count=50):
|
|
|
>>> s = r.text.lower()
|
|
|
>>> s = plaintext(s)
|
|
|
>>> s = parsetree(s)
|
|
|
>>> p = '{NP} (VP) more important than {NP}'
|
|
|
>>> for m in search(p, s):
|
|
|
>>> x = m.group(1).string # NP left
|
|
|
>>> y = m.group(2).string # NP right
|
|
|
>>> if x not in g:
|
|
|
>>> g.add_node(x)
|
|
|
>>> if y not in g:
|
|
|
>>> g.add_node(y)
|
|
|
>>> g.add_edge(g[x], g[y], stroke=(0,0,0,0.75)) # R,G,B,A
|
|
|
>>>
|
|
|
>>> g = g.split()[0] # Largest subgraph.
|
|
|
>>>
|
|
|
>>> for n in g.sorted()[:40]: # Sort by Node.weight.
|
|
|
>>> n.fill = (0, 0.5, 1, 0.75 * n.weight)
|
|
|
>>>
|
|
|
>>> g.export('test', directed=True, weighted=0.6) </pre></div>
|
|
|
<p>Some relations (= edges) could use some extra post-processing, e.g., in <em>nothing is more important than life</em>, <em>nothing</em> is <span style="text-decoration: underline;">not</span> more important than <em>life</em>.</p>
|
|
|
<p> </p>
|
|
|
<hr />
|
|
|
<h2>Case studies </h2>
|
|
|
<p>Case studies with hands-on source code examples.</p>
|
|
|
<table border="0">
|
|
|
<tbody>
|
|
|
<tr>
|
|
|
<td>
|
|
|
<p><a href="http://www.clips.ua.ac.be/pages/modeling-creativity-with-a-semantic-network-of-common-sense"><img src="../g/pattern_example_semantic_network.jpg" alt="" width="70" height="70" /><br /></a></p>
|
|
|
</td>
|
|
|
<td> </td>
|
|
|
<td><span class="smallcaps">modeling creativity with a semantic network of common sense </span><span class="small">(2013)</span> <br />This case study offers a computational model of creativity, by representing the mind as a semantic network of common sense, using <a class="link-maintenance" href="pattern-graph.html">pattern.graph</a> & <a class="link-maintenance" href="pattern-web.html">web</a>.<br /><a href="http://www.clips.ua.ac.be/pages/modeling-creativity-with-a-semantic-network-of-common-sense">read more »</a></td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td>
|
|
|
<p><a class="noexternal" href="http://www.clips.ua.ac.be/pages/using-wiktionary-to-build-an-italian-part-of-speech-tagger"><img src="../g/pattern_example_italian.jpg" alt="" width="70" height="70" /><br /></a></p>
|
|
|
</td>
|
|
|
<td> </td>
|
|
|
<td><span class="smallcaps">using wiktionary to build an italian part-of-speech tagger </span><span class="small">(2013)</span> <br />This case study demonstrates how a part-of-speech tagger for Italian (see <a class="link-maintenance" href="pattern-it.html">pattern.it</a>) can be built by mining Wiktionary and Wikipedia. <br /><a href="http://www.clips.ua.ac.be/pages/using-wiktionary-to-build-an-italian-part-of-speech-tagger">read more »</a></td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td>
|
|
|
<p><a class="noexternal" href="http://www.clips.ua.ac.be/pages/using-wikicorpus-nltk-to-build-a-spanish-part-of-speech-tagger"><img src="../g/pattern_example_spanish.jpg" alt="" width="70" height="70" /><br /></a></p>
|
|
|
</td>
|
|
|
<td> </td>
|
|
|
<td><span class="smallcaps">using wikicorpus and nltk to build a spanish part-of-speech tagger </span><span class="small">(2012)</span><br />This case study demonstrates how a part-of-speech tagger for Spanish (see <a class="link-maintenance" href="pattern-es.html">pattern.es</a>) can be built by using NLTK and the freely available Wikicorpus. <br /><a href="http://www.clips.ua.ac.be/pages/using-wikicorpus-nltk-to-build-a-spanish-part-of-speech-tagger">read more »</a></td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td>
|
|
|
<p><a class="noexternal" href="http://www.clips.ua.ac.be/pages/pattern-examples-elections"><img src="../g/pattern_example_elections.jpg" alt="" width="70" height="70" /><br /></a></p>
|
|
|
</td>
|
|
|
<td> </td>
|
|
|
<td><span class="smallcaps">belgian elections</span><span class="smallcaps">, twitter sentiment analysis </span><span class="small">(2010)</span><br />This case study uses sentiment analysis (e.g., positive or negative tone) on 7,500 Dutch and French tweets (see <a class="link-maintenance" href="pattern-web.html">pattern.web</a> | <a class="link-maintenance" href="pattern-nl.html">nl</a> | <a class="link-maintenance" href="pattern-fr.html">fr</a>) in the weeks before the Belgian 2010 elections. <br /><a href="http://www.clips.ua.ac.be/pages/pattern-examples-elections">read more »</a></td>
|
|
|
</tr>
|
|
|
<tr>
|
|
|
<td>
|
|
|
<p><a class="noexternal" href="http://www.clips.ua.ac.be/pages/pattern-examples-100days"><img src="../g/pattern_example_100days.jpg" alt="" width="70" height="70" /><br /></a></p>
|
|
|
</td>
|
|
|
<td> </td>
|
|
|
<td><span class="smallcaps">web mining and visualization </span><span class="small">(2010)</span><br />This case study uses a number of different approaches to mine, correlate and visualize about 6,000 Google News items and 70,000 tweets. <br /><a href="http://www.clips.ua.ac.be/pages/pattern-examples-100days">read more »</a></td>
|
|
|
</tr>
|
|
|
</tbody>
|
|
|
</table>
|
|
|
</div>
|
|
|
</div></div>
|
|
|
</div>
|
|
|
</div>
|
|
|
</div>
|
|
|
</div>
|
|
|
</div>
|
|
|
</div>
|
|
|
<script>
|
|
|
SyntaxHighlighter.all();
|
|
|
</script>
|
|
|
</body>
|
|
|
</html> |