thrid updates

master
bootje 4 years ago
parent c6e1d2215d
commit 2b81f12b04

BIN
.DS_Store vendored

Binary file not shown.

BIN
200312/.DS_Store vendored

Binary file not shown.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 137 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 431 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 290 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 271 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.5 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 502 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 341 KiB

File diff suppressed because it is too large Load Diff

Binary file not shown.

File diff suppressed because it is too large Load Diff

@ -1,57 +0,0 @@
import collections
# this script was adapted from:
# https://towardsdatascience.com/very-simple-python-script-for-extracting-most-common-words-from-a-story-1e3570d0b9d0
# https://git.xpub.nl/rita/categorization_of_files/src/branch/master/categorization.py
# open and read file
file = open(input("\nwhich platform's Terms of Service do you want to look at: \n"), encoding="utf8")
a = file.read()
# my stopwords are common words I don't want to count, like "a", "an", "the".
stopwords = set(line.strip() for line in open('stopwords.txt'))
# dictionary
wordcount = {}
# spliting words from punctuation so "book" and "book!" counts as the same word
for word in a.lower().split():
word = word.replace(".","")
word = word.replace(",","")
word = word.replace(":","")
word = word.replace("\"","")
word = word.replace("!","")
word = word.replace("“","")
word = word.replace("‘","")
word = word.replace("*","")
# counting
if word not in stopwords:
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
# print x most common words
# n_print = int(input("How many most common words to print: "))
n_print = int(5)
print("\nMost used colonial words are:")
word_counter = collections.Counter(wordcount)
for word, count in word_counter.most_common(n_print):
print(word,"", count)
# categories
# words that are inside the category Library Studies
library_studies = set(line.strip() for line in open('library_studies.txt'))
for word, count in word_counter.most_common(n_print):
if word in library_studies:
print("\nWe suggest the following categorization for this file:\nLibrary Studies\n")
break
else:
print("\nWe don't have any suggestion of categorization for this file.\n")
# Close the file
file.close()

@ -1,82 +0,0 @@
import collections
# from termcolor import colored
# this script was adapted from:
# https://towardsdatascience.com/very-simple-python-script-for-extracting-most-common-words-from-a-story-1e3570d0b9d0
# https://git.xpub.nl/rita/categorization_of_files/src/branch/master/categorization.py
# open and read file
file = open(input("\nwhich platform's Terms of Service do you want to look at: \n"), encoding="utf8")
a = file.read()
# f = open("tiktok.txt", "r")
# print(f.read())
# my stopwords are common words I don't want to count, like "a", "an", "the".
stopwords = set(line.strip() for line in open('stopwords.txt'))
# dictionary
wordcount = {}
# spliting words from punctuation so "book" and "book!" counts as the same word
for word in a.lower().split():
word = word.replace(".","")
word = word.replace(",","")
word = word.replace(":","")
word = word.replace("\"","")
word = word.replace("!","")
word = word.replace("“","")
word = word.replace("‘","")
word = word.replace("*","")
word = word.replace("(","")
word = word.replace(")","")
# counting
if word not in stopwords:
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
# print x most common words
n_print = int(100)
print("\nMost used colonial words are:")
word_counter = collections.Counter(wordcount)
for word, count in word_counter.most_common(n_print):
print(word,"", count)
# word_counter = collections.Counter(wordcount)
# for word, count in word_counter.most_common(n_print):
# print(word,"—", count)
# colonial texts in bold
# for word in n_print:
# if word in n_print:
# wordcount.append(colored(word, 'white', 'on_red'))
# else:
# wordcount.append(t)
# print(" ".join(colored(word, 'white', 'on_red'))
# categories
# words that are inside the category Library Studies
library_studies = set(line.strip() for line in open('library_studies.txt'))
for word, count in word_counter.most_common(n_print):
if word in library_studies:
print("\nWe suggest the following categorization for this platform:\nLibrary Studies\n")
break
else:
print("\nThese are the TikTok's colonial words.\n")
# Close the file
file.close()

Binary file not shown.

Before

Width:  |  Height:  |  Size: 234 KiB

@ -1,19 +0,0 @@
archives
author
bibliographic
bibliotheca
book
bookcase
books
bookshelf
bookstore
catalogue
e-book
librarian
librarianship
library
literature
manuscripts
papyrus
read
reading

@ -1,3 +0,0 @@
Any questions, comments, suggestions, ideas, original or creative materials or other information you submit about FaceApp or our products or Services (collectively, “Feedback”), is non-confidential and we have no obligations (including without limitation obligations of confidentiality) with respect to such Feedback. You hereby grant to FaceApp a fully paid, royalty-free, perpetual, irrevocable, worldwide, non-exclusive, and fully sublicensable right and license to use, reproduce, perform, display, distribute, adapt, modify, re-format, create derivative works of, and otherwise commercially or non-commercially exploit in any manner, any and all Feedback, and to sublicense the foregoing rights, in connection with the operation and maintenance of the Services and/or FaceApps business.
- If you choose to login to the Services via a third-party platform or social media network, you will need to use your credentials (e.g., username and password) from a third-party online platform. You must maintain the security of your third party account and promptly notify us if you discover or suspect that someone has accessed your account without your permission. If you permit others to use your account credentials, you are responsible for the activities of such users that occur in connection with your account.

@ -1,67 +0,0 @@
-
a
about
all
an
and
are
as
at
be
but
by
can
do
for
from
get
had
has
have
he
I
i
if
in
into
is
it
its
me
more
my
not
of
on
one
or
other
out
so
some
such
than
that
the
their
them
then
there
these
they
this
those
to
up
was
were
what
when
which
who
whom
will
with
would
|

@ -1,3 +0,0 @@
Any questions, comments, suggestions, ideas, original or creative materials or other information you submit about FaceApp or our products or Services (collectively, “Feedback”), is non-confidential and we have no obligations (including without limitation obligations of confidentiality) with respect to such Feedback. You hereby grant to FaceApp a fully paid, royalty-free, perpetual, irrevocable, worldwide, non-exclusive, and fully sublicensable right and license to use, reproduce, perform, display, distribute, adapt, modify, re-format, create derivative works of, and otherwise commercially or non-commercially exploit in any manner, any and all Feedback, and to sublicense the foregoing rights, in connection with the operation and maintenance of the Services and/or FaceApps business.
- If you choose to login to the Services via a third-party platform or social media network, you will need to use your credentials (e.g., username and password) from a third-party online platform. You must maintain the security of your third party account and promptly notify us if you discover or suspect that someone has accessed your account without your permission. If you permit others to use your account credentials, you are responsible for the activities of such users that occur in connection with your account.

Binary file not shown.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 282 KiB

@ -12,4 +12,4 @@ To activate a virtual environment
====================================
cd to the folder where "venv" is and...
source venb/bin/activate
source venv/bin/activate

BIN
nltk-book/.DS_Store vendored

Binary file not shown.

@ -2,39 +2,26 @@ import sys
import codecs
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
# NLTK's default English stopwords
default_stopwords = set(nltk.corpus.stopwords.words('english'))
from nltk import sent_tokenize, word_tokenize, pos_tag
#read stop words from a file (one stopword per line, UTF-8)
stopwords_file = './stopwords.txt'
custom_stopwords = set(codecs.open('stopwords.txt', 'r', 'utf-8').read().splitlines())
all_stopwords = default_stopwords | custom_stopwords
#open the txt file, read, and tokenize
file = open('faceapp.txt','r')
raw = file.read()
tokens = nltk.word_tokenize(raw)
faceapp = nltk.Text(tokens)
faceapp.concordance('services')
# Remove single-character tokens (mostly punctuation)
tokens = [word for word in tokens if len(word) > 1]
# Remove numbers
tokens = [word for word in tokens if not word.isnumeric()]
# Lowercase all words (default_stopwords are lowercase too)
tokens = [word.lower() for word in tokens]
# Remove stopwords
tokens = [word for word in tokens if word not in all_stopwords]
# Calculate frequency distribution
fdist = nltk.FreqDist(tokens)
# Output top 50 words
for word, frequency in fdist.most_common(10):
print(u'{};{}'.format(word, frequency))
# pos_tag = [word_tokenize(sent) for sent in sent_tokenize(raw)]
pos_tag = [pos_tag(word_tokenize(sent))for sent in sent_tokenize(raw)]
print(pos_tag)

@ -1,17 +1,20 @@
EPISTEMIC = "epistemic" # Expresses degree of coloniality.
# 100.00 = Extreme level of coloniality
# 90.00 =
# 80.00 =
# 70.00 =
# 60.00 =
# gradation of intensity words
# 100.00 = absolute level of coloniality
# 90.00 = extreme level of coloniality
# 80.00 = heavy level of coloniality
# 70.00 = high level of coloniality
# 60.00 = significant level of coloniality
# 50.00 =
# 40.00 =
# 30.00 =
# 20.00 =
# 10.00 =
# 0.00 = Neutral level of coloniality
# 40.00 = relative level of coloniality
# 30.00 = moderate level of coloniality
# 20.00 = reasonable level of coloniality
# 10.00 = fair level of coloniality
# 0.00 = neutral level of coloniality
# lists of part of speech
#MD = would, could...
#RB = adverb 'very', 'slightly'...
#VB = verb
@ -37,36 +40,19 @@ epistemic_MD = { # would => could => can => should => shall => will => must
}
epistemic_MD = {
100.00: d(),
90.00: d(),
80.00: d(),
70.00: d(),
60.00: d(),
50.00: d(),
40.00: d(),
30.00: d(),
20.00: d(),
10.00: d(),
0.00: d(),
}
epistemic_MD = {
100.00: d(),
90.00: d(),
80.00: d(),
70.00: d(),
60.00: d(),
epistemic_VB = { #verbs from FaceApp ToS
100.00: d("must", "agree","use"),
90.00: d("use", "bound", "access", "allow", "acknowlegde", "reproduce"),
80.00: d("choose","claim", "permit", "collect" ),
70.00: d("change", ),
60.00: d("create"),
50.00: d(),
40.00: d(),
30.00: d(),
20.00: d(),
10.00: d(),
40.00: d("maintain"),
30.00: d("support"),
20.00: d("identify"),
10.00: d("may"),
0.00: d(),
}

@ -1,3 +1,13 @@
Terms of Service
1. Eligibility
You must be at least 13 years of age to access or use our Services. If you are under 18 years of age (or the age of legal majority where you live), you may only access or use our Services under the supervision of a parent or legal guardian who agrees to be bound by this Agreement. If you are a parent or legal guardian of a user under the age of 18 (or the age of legal majority), you agree to be fully responsible for the acts or omissions of such user in connection with our Services. If you are accessing or using our Services on behalf of another person or entity, you represent that you are authorized to accept this Agreement on that person or entitys behalf and that the person or entity agrees to be responsible to us if you or the other person or entity violates this Agreement.
2. User Accounts and Account Security
If you choose to login to the Services via a third-party platform or social media network, you will need to use your credentials (e.g., username and password) from a third-party online platform. You must maintain the security of your third party account and promptly notify us if you discover or suspect that someone has accessed your account without your permission. If you permit others to use your account credentials, you are responsible for the activities of such users that occur in connection with your account.
3. Privacy
Please refer to our Privacy Policy for information about how we collect, use and disclose information about you.
4. User Content
Our Services may allow you and other users to create, post, store and share content, including photos, videos, messages, text, software and other materials (collectively, “User Content”). User Content does not include user-generated filters. Subject to this Agreement and the Privacy Policy, you retain all rights in and to your User Content, as between you and FaceApp. Further, FaceApp does not claim ownership of any User Content that you post on or through the Services. You grant FaceApp a nonexclusive, royalty-free, worldwide, fully paid license to use, reproduce, modify, adapt, create derivative works from, distribute, perform and display your User Content during the term of this Agreement solely to provide you with the Services.
@ -7,4 +17,221 @@ You represent and warrant that: (i) you own or otherwise have the right to use t
You may not create, post, store or share any User Content that violates this Agreement or for which you do not have all the rights necessary to grant us the license described above. Although we have no obligation to screen, edit or monitor User Content, we may delete or remove User Content at any time and for any reason.
FaceApp is not a backup service and you agree that you will not rely on the Services for the purposes of User Content backup or storage. FaceApp will not be liable to you for any modification, suspension, or discontinuation of the Services, or the loss of any User Content.
FaceApp is not a backup service and you agree that you will not rely on the Services for the purposes of User Content backup or storage. FaceApp will not be liable to you for any modification, suspension, or discontinuation of the Services, or the loss of any User Content.
5. Prohibited Conduct and Content
You will not violate any applicable law, contract, intellectual property or other third-party right or commit a tort, and you are solely responsible for your conduct while accessing or using our Services. You will not:
Engage in any harassing, threatening, intimidating, predatory or stalking conduct;
Use or attempt to use another users account without authorization from that user and FaceApp;
Use our Services in any manner that could interfere with, disrupt, negatively affect or inhibit other users from fully enjoying our Services or that could damage, disable, overburden or impair the functioning of our Services in any manner;
Reverse engineer any aspect of our Services or do anything that might discover source code or bypass or circumvent measures employed to prevent or limit access to any part of our Services;
Attempt to circumvent any content-filtering techniques we employ or attempt to access any feature or area of our Services that you are not authorized to access;
Develop or use any third-party applications that interact with our Services without our prior written consent, including any scripts designed to scrape or extract data from our Services;
Use our Services for any illegal or unauthorized purpose, or engage in, encourage or promote any activity that violates this Agreement.
You may also only post or otherwise share User Content that is non-confidential and you have all necessary rights to disclose. You may not create, post, store or share any User Content that:
Is unlawful, libelous, defamatory, obscene, pornographic, indecent, lewd, suggestive, harassing, threatening, invasive of privacy or publicity rights, abusive, inflammatory or fraudulent;
Would constitute, encourage or provide instructions for a criminal offense, violate the rights of any party or otherwise create liability or violate any local, state, national or international law;
May infringe any patent, trademark, trade secret, copyright or other intellectual or proprietary right of any party;
Contains or depicts any statements, remarks or claims that do not reflect your honest views and experiences;
Impersonates, or misrepresents your affiliation with, any person or entity;
Contains any unsolicited promotions, political campaigning, advertising or solicitations;
Contains any private or personal information of a third party without such third partys consent;
Contains any viruses, corrupted data or other harmful, disruptive or destructive files or content; or
Is, in our sole judgment, objectionable or that restricts or inhibits any other person from using or enjoying our Services, or that may expose FaceApp or others to any harm or liability of any type.
In addition, although we have no obligation to screen, edit or monitor User Content, we may delete or remove User Content at any time and for any reason.
6. Limited License; Copyright and Trademark
Our Services and the text, graphics, images, photographs, videos, illustrations, trademarks, trade names, page headers, button icons, scripts, service marks, logos, slogans, filters, user generated filters and other content contained therein (collectively, the “FaceApp Content”) are owned by or licensed to FaceApp and are protected under both United States and foreign laws. Except as explicitly stated in this Agreement, FaceApp and our licensors reserve all rights in and to our Services and the FaceApp Content. You are hereby granted a limited, nonexclusive, nontransferable, non-sublicensable, revocable license to access and use our Services and FaceApp Content for your own personal use; however, such license is subject to this Agreement and does not include any right to: (a) sell, resell or commercially use our Services or FaceApp Content; (b) copy, reproduce, distribute, publicly perform or publicly display FaceApp Content, except as expressly permitted by us or our licensors; (c) modify the FaceApp Content, remove any proprietary rights notices or markings, or otherwise make any derivative uses of our Services or FaceApp Content, except as expressly set forth in this Agreement; (d) use any data mining, robots or similar data gathering or extraction methods; or (e) use our Services or FaceApp Content other than as expressly provided in this Agreement. Any use of our Services or FaceApp Content other than as specifically authorized herein, without our prior written permission, is strictly prohibited and will terminate the license granted under this Agreement. You will not remove, alter or conceal any copyright, trademark, service mark or other proprietary rights notices incorporated in or accompanying the FaceApp Content.
7. Feedback
Any questions, comments, suggestions, ideas, original or creative materials or other information you submit about FaceApp or our products or Services (collectively, “Feedback”), is non-confidential and we have no obligations (including without limitation obligations of confidentiality) with respect to such Feedback. You hereby grant to FaceApp a fully paid, royalty-free, perpetual, irrevocable, worldwide, non-exclusive, and fully sublicensable right and license to use, reproduce, perform, display, distribute, adapt, modify, re-format, create derivative works of, and otherwise commercially or non-commercially exploit in any manner, any and all Feedback, and to sublicense the foregoing rights, in connection with the operation and maintenance of the Services and/or FaceApps business.
8. Copyright Complaints
We have a policy of limiting access to our Services and terminating the accounts of users who repeatedly infringe the intellectual property copyright rights of others upon prompt notification to us by the copyright owner or the copyright owners legal agent. Without limiting the foregoing, if you believe that your work has been copied and posted on or through the Services in a way that constitutes copyright infringement, please provide our Copyright Agent with the following information: (a) an electronic or physical signature of the person authorized to act on behalf of the owner of the copyright interest; (b) a description of the copyrighted work that you claim has been infringed; (c) a description of the location on the Services of the material that you claim is infringing; (d) your address, telephone number and e-mail address; € a written statement by you that you have a good faith belief that the disputed use is not authorized by the copyright owner, its agent or the law; and (f) a statement by you, made under penalty of perjury, that the above information in your notice is accurate and that you are the copyright owner or authorized to act on the copyright owners behalf. Contact information for FaceApps Copyright Agent for notice of claims of infringement is as follows: Yaroslav Goncharov, Designated DMCA Copyright Agent, FaceApp Inc, 1000 N West Street, Suite 1200, Wilmington, Delaware, 19801.
9. Indemnification
To the fullest extent permitted by applicable law, you will indemnify, defend, and hold harmless FaceApp and each of our respective officers, directors, agents, partners and employees (individually and collectively, the “FaceApp Parties”) from and against any loss, liability, claim, demand, damages, expenses or costs (“Claims”) arising out of or related to (a) your access to or use of our Services; (b) your User Content or Feedback; (c) your violation of this Agreement; (d) your violation, misappropriation or infringement of any rights of another (including intellectual property rights or privacy rights); or (e) your conduct in connection with our Services. You agree to promptly notify FaceApp Parties of any third party Claims, cooperate with FaceApp Parties in defending such Claims and pay all fees, costs and expenses associated with defending such Claims (including, but not limited to, attorneys fees). You also agree that the FaceApp Parties will have control of the defense or settlement of any third party Claims. This indemnity is in addition to, and not in lieu of, any other indemnities set forth in a written agreement between you and FaceApp or the other FaceApp Parties.
10. Disclaimers
We do not control, endorse or take responsibility for any User Content or third-party content available on or linked to by our Services.
YOUR USE OF OUR SERVICES IS AT YOUR SOLE RISK. OUR SERVICES ARE PROVIDED “AS IS” AND “AS AVAILABLE” WITHOUT WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE, AND NON-INFRINGEMENT. In addition, FaceApp does not represent or warrant that our Services are accurate, complete, reliable, current or error-free. While FaceApp attempts to make your access to and use of our Services safe, we cannot and do not represent or warrant that our Services or servers are free of viruses or other harmful components. You assume the entire risk as to the quality and performance of the Services.
11. Limitation of Liability
FACEAPP AND THE OTHER FACEAPP PARTIES WILL NOT BE LIABLE TO YOU UNDER ANY THEORY OF LIABILITY—WHETHER BASED IN CONTRACT, TORT, NEGLIGENCE, STRICT LIABILITY, WARRANTY, OR OTHERWISE—FOR ANY INDIRECT, CONSEQUENTIAL, EXEMPLARY, INCIDENTAL, PUNITIVE OR SPECIAL DAMAGES OR LOST PROFITS, EVEN IF FACEAPP OR THE OTHER FACEAPP PARTIES HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
THE TOTAL LIABILITY OF FACEAPP AND THE OTHER FACEAPP PARTIES, FOR ANY CLAIM ARISING OUT OF OR RELATING TO THIS AGREEMENT OR OUR SERVICES, REGARDLESS OF THE FORM OF THE ACTION, IS LIMITED TO THE AMOUNT PAID, IF ANY, BY YOU TO ACCESS OR USE OUR SERVICES.
The limitations set forth in this section will not limit or exclude liability for the gross negligence, fraud or intentional misconduct of FaceApp or the other FaceApp Parties or for any other matters in which liability cannot be excluded or limited under applicable law. Additionally, some jurisdictions do not allow the exclusion or limitation of incidental or consequential damages, so the above limitations or exclusions may not apply to you.
12. Release
To the fullest extent permitted by applicable law, you release FaceApp and the other FaceApp Parties from responsibility, liability, claims, demands, and/or damages (actual and consequential) of every kind and nature, known and unknown (including, but not limited to, claims of negligence), arising out of or related to disputes between users and the acts or omissions of third parties. You expressly waive any rights you may have under California Civil Code § 1542 as well as any other statute or common law principles that would otherwise limit the coverage of this release to include only those claims which you may know or suspect to exist in your favor at the time of agreeing to this release.
13. Transfer and Processing Data
By accessing or using our Services, you acknowledge and, as applicable, consent to the processing, transfer and storage of information about you in and to the United States and other countries.
14. Dispute Resolution; Binding Arbitration Agreement
Please read the following section carefully because it requires users who are U.S. residents to arbitrate certain disputes and claims with FaceApp and limits the manner in which you can seek relief from us.
Applicability of Arbitration Agreement. Except for small claims disputes in which you or FaceApp seek to bring an individual action in small claims court located in the county of your billing address or disputes in which you or FaceApp seeks injunctive or other equitable relief for the alleged unlawful use of intellectual property, you and FaceApp waive your rights to a jury trial and to have any dispute arising out of or related to this Agreement or our Services resolved in court. This Arbitration Agreement shall apply, without limitation, to all disputes or claims and requests for relief that arose or were asserted before the effective date of this Agreement or any prior version of this Agreement.
Arbitration Rules and Forum. The Federal Arbitration Act governs the interpretation and enforcement of this Arbitration Agreement. To begin an arbitration proceeding, you must send a letter requesting arbitration and describing your dispute or claim or request for relief to our registered agent [include name and address of registered agent here]. The arbitration will be resolved through confidential binding arbitration by the Judicial Arbitration and Mediation Services (“JAMS”), an established alternative dispute resolution provider. Disputes involving claims, counterclaims, or requests for relief under $250,000, not inclusive of attorneys fees and interest, shall be subject to JAMSs most current version of the Streamlined Arbitration Rules and procedures available; all other disputes shall be subject to JAMSs most current version of the Comprehensive Arbitration Rules and Procedures, available at http://www.jamsadr.com/rules-comprehensive-arbitration/. JAMSs rules are also available at www.jamsadr.com or by calling JAMS at 800-352-5267. If JAMS is not available to arbitrate, the parties will select an alternative arbitral forum. If the arbitrator finds that you cannot afford to pay JAMSs filing, administrative, hearing and/or other fees and cannot obtain a waiver from JAMS, FaceApp will pay them for you. In addition, we will reimburse all such JAMSs filing, administrative, hearing and/or other fees for disputes, claims, or requests for relief totaling less than $10,000 unless the arbitrator determines the claims are frivolous. You may choose to have the arbitration conduced by telephone, based on written submissions, or in person in the country where you live or at another mutually agreed location. Any judgment on the award rendered by the arbitrator may be entered in any court of competent jurisdiction.
You may choose to have the arbitration conduced by telephone, based on written submissions or at another mutually agreed location. Any judgment on the award rendered by the arbitrator may be entered in any court of competent jurisdiction.
Authority of Arbitrator. The arbitrator shall have exclusive authority to (a) determine the scope and enforceability of this Arbitration Agreement and (b) resolve any dispute related to the interpretation, applicability, enforceability or formation of this Arbitration Agreement, including, but not limited to, any assertion that all or any part of this Arbitration Agreement is void or voidable. The arbitration will decide the rights and liabilities, if any, of you and FaceApp. The arbitration proceeding will not be consolidated with any other matters or joined with any other cases or parties. The arbitrator shall have the authority to grant motions dispositive of all or part of any claim. The arbitrator shall have the authority to award monetary damages and to grant any non-monetary remedy or relief available to an individual under applicable law, the arbitral forums rules, and the Agreement (including the Arbitration Agreement). The arbitrator shall issue a written award and statement of decision describing the essential findings and conclusions on which the award is based, including the calculation of any damages awarded. The arbitrator has the same authority to award relief on an individual basis that a judge in a court of law would have. The award of the arbitrator is final and binding upon you and us.
Waiver of Jury Trial. YOU AND FACEAPP HEREBY WAIVE ANY CONSTITUTIONAL AND STATUTORY RIGHTS TO SUE IN COURT AND HAVE A TRIAL IN FRONT OF A JUDGE OR A JURY. You and FaceApp are instead electing that all disputes, claims or requests for relief shall be resolved by arbitration under this Arbitration Agreement, except as specified above. An arbitrator can award on an individual basis the same damages and relief as a court and must follow this Agreement as a court would. However, there is no judge or jury in arbitration, and court review of an arbitration award is subject to very limited. Review.
Waiver of Class or Other Non-Individualized Relief. ALL DISPUTES, CLAIMS AND REQUESTS FOR RELIEF WITHIN THE SCOPE OF THIS ARBITRATION AGREEMENT MUST BE ARBITRATED ON AN INDIVIDUAL BASIS AND NOT ON A CLASS OR COLLECTIVE BASIS. ONLY INDIVIDUAL RELIEF IS AVAILABLE, AND CLAIMS OF MORE THAN ONE USER CANNOT BE ARBITRATED OR CONSOLIDATED WITH THOSE OF ANY OTHER USER. If a decision is issued stating that applicable law precludes enforcement of any of this subsections limitations as to a given dispute, claim or request for relief, then such aspect must be severed from the arbitration and brought into the State or Federal Courts located in the State of California. All other disputes, claims, or requests for relief shall be arbitrated.
30-Day Right to Opt-Out. You have the right to opt out of the provisions of this Arbitration Agreement by sending written notice of your decision to opt-out to: arbitration@faceapp.com, within 30 days after first becoming subject to this Arbitration Agreement. Your notice must include your name and address, your username (if any), the e-mail address you used to set up your account (if you have one), and an unequivocal statement that you want to opt out of this Arbitration Agreement. If you opt out of this Arbitration Agreement, all other parts of this Agreement will continue to apply to you. Opting out of this Arbitration Agreement has no effect on any other arbitration agreements that you may currently have, or may enter in the future, with us.
You and FaceApp agree that the state or federal courts of the State of California and the United States sitting in Santa Clara County, California have exclusive jurisdiction over any appeals and the enforcement of an arbitration award.
Severability. Except as provided in this Section 14 above, if any part or parts of this Arbitration Agreement are found under the law to be invalid or unenforceable, then such specific part or parts shall be of no force and effect and shall be severed, and the remainder of the Arbitration Agreement shall continue in full force and effect.
Survival of Agreement. This Arbitration Agreement will survive the termination of your relationship with FaceApp.
Modification, Notwithstanding any provision in this Agreement to the contrary, we agree that if FaceApp makes any future material change to this Arbitration Agreement you may reject that change within thirty (30) days of such change becoming effective by writing Company at the following address: arbitration@faceapp.com.
15. Governing Law and Venue
This Agreement and your access to and use of our Services will be governed by and construed and enforced in accordance with the laws of California, consistent with the Federal Arbitration Act, without regard to conflict of law rules or principles (whether of California or any other jurisdiction) that would cause the application of the laws of any other jurisdiction. The United Nations Convention for the International Sale of Goods does not apply to the Agreement. Any dispute between the parties that is not subject to arbitration or cannot be heard in small claims court will be resolved in the state or federal courts of California and the United States, respectively, sitting in Santa Clara County, California.
16. Electronic Communications
By accessing or using the Services, you also consent to receive electronic communications from FaceApp (e.g., responses to your requests, questions and feedback, announcements, updates, and security alerts through a push notification or by posting notices on our Services). You agree that any notices, agreements, disclosures or other communications that we send to you electronically will satisfy any legal communication requirements, including, but not limited to, that such communications be in writing.
17. Termination
We reserve the right, without notice and in our sole discretion, to terminate your right to access or use our Services. We are not responsible for any loss or harm related to your inability to access or use our Services.
18. Severability
If any provision or part of a provision of this Agreement is unlawful, void or unenforceable, that provision or part of the provision is deemed severable from this Agreement and does not affect the validity and enforceability of any remaining provisions.
19. Additional Terms Applicable to iOS Devices
The following terms apply if you install, access or use the Services on any device that contains the iOS mobile operating system (the “App”) developed by Apple Inc. (“Apple”).
Acknowledgement. You acknowledge that this Agreement is concluded solely between us, and not with Apple, and FaceApp, not Apple, is solely responsible for the App and the content thereof. You further acknowledge that the usage rules for the App are subject to any additional restrictions set forth in the Usage Rules for the Apple App Store Terms of Service as of the date you download the App, and in the event of any conflict, the Usage Rules in the App Store shall govern if they are more restrictive. You acknowledge and agree that you have had the opportunity to review the Usage Rules.
Scope of License. The license granted to you is limited to a non-transferable license to use the App on any iPhone, iPod touch or iPad that you own or control as permitted by the Usage Rules set forth in the Apple App Store Terms of Service.
Maintenance and Support. You and FaceApp acknowledge that Apple has no obligation whatsoever to furnish any maintenance and support services with respect to the App.
Warranty. You acknowledge that Apple is not responsible for any product warranties, whether express or implied by law, with respect to the App. In the event of any failure of the App to conform to any applicable warranty, you may notify Apple, and Apple will refund the purchase price, if any, paid to Apple for the App by you; and to the maximum extent permitted by applicable law, Apple will have no other warranty obligation whatsoever with respect to the App. The parties acknowledge that to the extent that there are any applicable warranties, any other claims, losses, liabilities, damages, costs or expenses attributable to any failure to conform to any such applicable warranty would be the sole responsibility of FaceApp. However, you understand and agree that in accordance with this Agreement, FaceApp has disclaimed all warranties of any kind with respect to the App, and therefore, there are no warranties applicable to the App.
Product Claims. You and FaceApp acknowledge that as between Apple and FaceApp, FaceApp, not Apple, is responsible for addressing any claims relating to the App or your possession and/or use of the App, including, but not limited to (a) product liability claims, (b) any claim that the App fails to conform to any applicable legal or regulatory requirement, and (c) claims arising under consumer protection or similar legislation.
Intellectual Property Rights. The parties acknowledge that, in the event of any third party claim that the App or your possession and use of the App infringe that third partys intellectual property rights, FaceApp, and not Apple, will be solely responsible for the investigation, defense, settlement and discharge of any such intellectual property infringement claim to the extent required under this Agreement.
Legal Compliance. You represent and warrant that (a) you are not located in a country that is subject to a U.S. Government embargo, or that has been designated by the U.S. Government as a “terrorist supporting” country, and (b) you are not listed on any U.S. Government list of prohibited or restricted parties.
Developer Name and Address. Any questions, complaints or claims with respect to the App should be directed to:
FaceApp Inc
1000 N West Street, Suite 1200,
Wilmington, Delaware, 19801
USA
contact@faceapp.com
Third-Party Terms of Agreement. You agree to comply with any applicable third-party terms when using the Services.
Third-Party Beneficiary. The parties acknowledge and agree that Apple, and Apples subsidiaries, are third-party beneficiaries of this Agreement, and that, upon your acceptance of this Agreement, Apple will have the right (and will be deemed to have accepted the right) to enforce this Agreement against you as a third-party beneficiary thereof).
20. Export
You may not use, export, import, or transfer all or any portion of the Services except as authorized by U.S. law, the laws of the jurisdiction in which you obtained the Services, and any other applicable laws. In particular, but without limitation, the Services may not be exported or re-exported (a) into any United States embargoes countries, or (b) to anyone on the U.S. Treasury Departments list of Specially Designated Nationals or the U.S. Department of Commerces Denied Persons List or Entity List. By using the Services, you represent and warrant that (y) you are not located in a country that is subject to a U.S. Government embargo, or that has been designated by the U.S. Government as a “terrorist supporting” country and (z) you are not listed on any U.S. Government list of prohibited or restricted parties. You also will not use the Services for any purpose prohibited by U.S. law, including the development, design, manufacture or production of missiles, nuclear, chemical or biological weapons. You acknowledge and agree that products, services or technology provided by FaceApp are subject to the export control laws and regulations of the United States. You shall comply with these laws and regulations and shall not, without prior U.S. government authorization, export, re-export, or transfer FaceApp products, services or technology, either directly or indirectly, to any country in violation of such laws and regulations.
21. Miscellaneous
In accordance with California Civil Code section 1789.3, you may report complaints to the Complaint Assistance Unit of the Division of Consumer Services of the California Department of Consumer Affairs by contacting them in writing at 400 R Street, Sacramento, CA 95814, or by telephone at (800) 952-5210. This Agreement constitutes the entire agreement between you and FaceApp relating to your access to and use of our Services. The failure of FaceApp to exercise or enforce any right or provision of this Agreement will not operate as a waiver of such right or provision. The section titles in this Agreement is for convenience only and have no legal or contractual effect. Except as otherwise provided herein, this Agreement is intended solely for the benefit of the parties and are not intended to confer third party beneficiary rights upon any other person or entity.
Privacy Policy
Personal Information We Collect
When you use the App, we may collect information about you, including:
Photographs you provide when you use the App, via your camera or camera roll (if you have granted us permission to access your camera or camera roll), the in-App internet search functionality, or your social media account (if you choose to connect your social media account). We obtain only the specific images you chose to modify using the App; we do not collect your photo albums even if you grant us your access to them. We encrypt each photograph that you upload using the App. The encryption key is stored locally on your device. This means that the only device that can view the photo is the device from which the photograph was uploaded using the App the users device. Please note that while we do not require or request any metadata attached to the photographs you upload, metadata (including, for example, geotags) may be associated with your photographs by default. We take steps to delete any metadata that may be associated with a photograph you provide when you use the App.
App usage information, such as information about how you use the App and interact with us, including your preferred language, the date and time when you first installed the App and the date and time you last used the App.
Purchase history, if you choose to purchase an App subscription, such as confirmation that you are a paid subscriber to the App.
Social media information, if you choose to login to the App via a third-party platform or social media network (for example, Facebook), or otherwise connect your account on the third-party platform or network to the App. We may collect information from that platform or network, such as your social media alias, first and last name, number of “friends” on the social media platform and, if depending on your Facebook or other network settings, a list of your friends or connections (though we do not use or store this information). Our collection and processing of the information we obtain from social media platforms is governed by the requirements these social media platforms impose on us in their relevant terms and conditions.
Device data, such as your computer and mobile device operating system type and version number, manufacturer and model, device ID, push tokens, Google Advertising ID, Apple ID for Advertising, browser type, screen resolution, IP address (and the associated country in which you are located), the website you visited before visiting our Site; and other information about the device you are using to visit the App.
Online activity data, such as information about your use of and actions on the App and the Sites, including pages or screens you viewed, how long you spent on a page or screen, navigation paths between pages or screens, information about your activity on a page or screen, access times, and length of access. Our service providers and certain third parties (e.g., online advertising networks and their clients) also may collect this type of information over time and across third-party websites and mobile applications. This information may be collected on our Site using cookies, browser web storage (also known as locally stored objects, or “LSOs”), web beacons, and similar technologies. We may collect this information directly or through our use of third-party software development kits (“SDKs”). SDKs may enable third parties to collect information directly from our App.
How We Use Your Personal Information
We do not use the photographs you provide when you use the App for any reason other than to provide you with the portrait editing functionality of the App. We may use information other than photographs for the following purposes:
To operate and improve the App:
Enable you to use the Apps features;
Establish and maintain your account, if you choose to login to the App using your social media account;
Communicate with you about the App, including by sending you announcements, updates, and security alerts, which we may send through a push notification, and responding to your requests, questions and feedback;
Provide technical support and maintenance for the App; and
Perform statistical analysis about use of the App (including throught the use of Google Analytics).
To send you marketing and promotional communications. We may send you marketing communications as permitted by law. You will have the ability to opt-out of our marketing and promotional communications as described in the Opt out of marketing section below.
To display advertisements to you. If you use the free version of the App, we work with advertising partners to display advertisements within the App. These advertisements are delivered by our advertising partners and may be targeted based on your use of the App or your activity elsewhere online. To learn more about your choices in connection with advertisements, please see the section below titled “Targeted online advertising.”
For compliance, fraud prevention, and safety. We may use your personal information and disclose it to law enforcement, government authorities, and private parties as we believe necessary or Appropriate to: (a) protect our, your or others rights, privacy, safety or property (including by making and defending legal claims); (b) enforce the terms and conditions that govern the Service; and (c) protect, investigate and deter against fraudulent, harmful, unauthorized, unethical or illegal activity.
With your consent. In some cases, we may specifically ask for your consent to collect, use or share your personal information, such as when required by law.
To create anonymous, aggregated or de-identified data. We may create anonymous, aggregated or de-identified data from your personal information and other individuals whose personal information we collect. We make personal information into anonymous, aggregated or de-identified data by removing information that makes the data personally identifiable to you. We may use this anonymous, aggregated or de-identified data and share it with third parties for our lawful business purposes.
How We Share Your Personal Information
We do not disclose user photographs to third parties (with the exception of uploading an encrypted image to our cloud providers Google Cloud Platform and Amazon Web Services to provide the photo editing features of the App). We may share your non-photograph information in the following circumstances:
Affiliates. We may share App usage information with our subsidiaries and affiliates, for purposes consistent with this Privacy Policy.
Service providers. We may share your personal information with services providers that perform services on our behalf or help us operate the App (such as customer support, hosting, analytics, email delivery, marketing, and database management services). These third parties may use your personal information only as directed or authorized by us and in a manner consistent with this Privacy Policy, and are prohibited from using or disclosing your information for any other purpose.
Advertising partners. When we use third-party cookies and other tracking tools, our advertising partners may collect information from your device to help us analyze use of the Site and the App, display advertisements on the App and advertise the Site and App (and related content) elsewhere online.
Third-party platforms and social media networks. If you have enabled features or functionality that connect the App to a third-party platform or social media network (such as by logging into FaceApp using your account with the third-party, providing your API key or similar access token for the App to a third-party, or otherwise linking your account with the App to a third-partys services), we may disclose the personal information that you authorized us to share (such as when you elect to upload a photograph to your social media account). We do not control the third-party platforms use of your personal information, which is governed by that third partys privacy policy and terms and conditions.
Professional advisors. We may disclose your personal information to professional advisors, such as lawyers, bankers, auditors and insurers, where necessary in the course of the professional services that they render to us.
For compliance, fraud prevention and safety. We may share your personal information for the compliance, fraud prevention and safety purposes described above.
Business transfers. We may sell, transfer or otherwise share some or all of our business or assets, including your personal information, in connection with a business transaction (or potential business transaction) such as a corporate divestiture, merger, consolidation, acquisition, reorganization or sale of assets, or in the event of bankruptcy or dissolution.
Compliance with Law
We may be required to use and share your personal information to comply with applicable laws, lawful requests, and legal process, such as to respond to subpoenas or requests from government authorities.
Your Choices
In this section, we describe the rights and choices available to all users. Users who are located within European can find additional information about their rights below.
Opt out of marketing communications and other push notifications. You may opt out of marketing-related communications and other notifications we may send you via push notification by changing the settings on your mobile device.
Device permissions. You may revoke any permissions you previously granted to us, such as permission to access your camera or camera roll, through the settings on your mobile device.
Cloud processing. You may request that we remove your information, including photographs, from the cloud before the 24-48 hour period after which Google Cloud Platform or Amazon Web Services automatically deletes the information by clicking the “Request cloud data removal” button in the “Support” section of the App Settings on your mobile device.
Cookies & Browser Web Storage. Most browsers let you remove or reject cookies. To do this, follow the instructions in your browser settings. Many browsers accept cookies by default until you change your settings. Please note that if you set your browser to disable cookies, the Site may not work properly. Similarly, your browser settings may allow you to clear your browser web storage.
Targeted online advertising. Some of the business partners that collect information about users activities on or through the Site or App may be members of organizations or programs that provide choices to individuals regarding the use of their browsing behavior or mobile application usage for purposes of targeted advertising.
Site users may opt out of receiving targeted advertising on websites through members of the Network Advertising Initiative by clicking here or the Digital Advertising Alliance by clicking here. App users may opt out of receiving targeted advertising in mobile apps through participating members of the Digital Advertising Alliance by installing the AppChoices mobile app, available here, and selecting the users choices. Please note that we also may work with companies that offer their own opt-out mechanisms and may not participate in the opt-out mechanisms that we linked above.
In addition, your mobile device settings may provide functionality to limit our, or our partners, ability to engage in ad tracking or targeted advertising using the Google Advertising ID or Apple ID for Advertising associated with your mobile device.
If you choose to opt-out of targeted advertisements, you will still see advertisements online but they may not be relevant to you. Even if you do choose to opt out, not all companies that serve online behavioral advertising are included in this list, so you may still receive some cookies and tailored advertisements from companies that are not listed.
Choosing not to share your personal information. Where we are required by law to collect your personal information, or where we need your personal information in order to provide the App to you, if you do not provide this information when requested (or you later ask to delete it), we may not be able to provide you with our services. We will tell you what information you must provide to use the App by designating it as required at the time of collection or through other appropriate means.
Third-party platforms or social media networks. If you choose to connect to the App via a third-party platform or social media network, such as by using Facebook login, you may have the ability to limit the information that we may obtain from the third-party at the time you login to the App using the third-partys authentication service or otherwise connect your account. Subsequently, you may be able to control your settings through the third-partys platform or service. For example, you may access and change your settings through the Facebook settings page for Apps and Websites. If you withdraw our ability to access certain information from a third-party platform or social media network, that choice will not apply to information that we have already received from that third party.
Other Sites, Mobile Applications and Services
The App may contain links to other websites, mobile applications, and other online services operated by third parties. These links are not an endorsement of, or representation that we are affiliated with, any third party. In addition, our content may be included on web pages or in mobile applications or online services that are not associated with us. We do not control third party websites, mobile applications or online services, and we are not responsible for their actions. Other websites, mobile applications and online services follow different rules regarding the collection, use and sharing of your personal information. We encourage you to read the privacy policies of the other websites, mobile applications and online services you use.
Security Practices
We use commercially reasonable security practices to help keep the information collected through the App secure and take reasonable steps to verify your identity before granting you access to your account (if you have an account with us). However, FaceApp cannot ensure the security of any information you transmit to FaceApp or guarantee that information on the App may not be accessed, disclosed, altered, or destroyed.
Please do your part to help us. You are responsible for maintaining the confidentiality of your login information and device identifiers, and for controlling access to communications between you and FaceApp, at all times. Your privacy settings may also be affected by changes the social media services you connect to FaceApp make to their services. We are not responsible for the functionality, privacy, or security measures of any other organization.
Retention
We configure Google Cloud Platform and Amazon Web Services to delete photographs and photograph-related information within 24-48 hours after the photograph was last edited using the App. This allows you to revisit the image for additional modifications during that time.
With respect to non-photograph information that we may collect, we will retain such information in a personally identifiable format only for as long as necessary to fulfill the purposes we have set out in this Privacy Policy. You may also ask that we delete your information using the “Request cloud data removal” button as described above or by contacting us.
Cross-Border Data Transfers
We store the information we collect in connection with the App on Amazon Web Services and Google Cloud Platform. For Amazon Web Services, we specify the US as the data storage location, for Google Cloud Platform, we specify data storage at an available location closest to you when you use the App. Your personal information may be accessed by our service providers in other locations outside of your state, province, or country. Your device ID (and general App usage information) may also be accessed by the Companys technical support team in other locations outside of your state, province, or country. We rely on the Privacy Shield, as described below, for transfers of data from the EU and Switzerland to FaceApp in the United States.
EU-U.S. Privacy Shield and Swiss-U.S. Privacy Shield
FaceApp Inc is the US entity that publishes and hosts the App. FaceApp Inc complies with the EU-U.S. and the Swiss-U.S. Privacy Shield Frameworks as set forth by the U.S. Department of Commerce regarding the collection, use, and retention of personal information transferred from the European Union and Switzerland to the United States. FaceApp Inc has submitted its certification to the Department of Commerce that it adheres to the Privacy Shield Principles. If there is any conflict between the terms in this Privacy Policy and the Privacy Shield Principles, the Privacy Shield Principles shall govern. To learn more about the Privacy Shield program, and to view our certification, please visit www.privacyshield.gov.
FaceApp Inc may transfer your personal information to third parties as described in this Privacy Policy. FaceApp Inc maintains contracts with its third-party service providers restricting their access, use and disclosure of personal information in compliance with our Privacy Shield obligations. FaceApp Inc may be liable if these third parties fail to meet those obligations and we are responsible for the event giving rise to the damage.
In compliance with the Privacy Shield Principles, FaceApp Inc commits to resolve complaints about our collection or use of your personal information. European individuals with inquiries or complaints regarding our Privacy Policy should first contact FaceApp Inc at privacy@faceapp.com. FaceApp Inc has further committed to refer unresolved Privacy Shield complaints to JAMS, an alternative dispute resolution provider located in the United States. If you do not receive timely acknowledgment of your complaint from us, or if we have not resolved your complaint, please visit www.jamsadr.com/eu-us-privacy-shield for more information or to file a complaint. The services of JAMS are provided at no cost to you. If neither FaceApp Inc nor JAMS resolves your complaint, you may have the ability to engage in binding arbitration through the Privacy Shield Panel. Additional information on the arbitration process is available on the Privacy Shield website at www.privacyshield.gov.
FaceApp Inc may be required to disclose personal data in response to lawful requests by public authorities, including to meet national security or law enforcement requirements. The Federal Trade Commission has jurisdiction over FaceApp Incs compliance with the Privacy Shield. FaceApp Incs commitments under the Privacy Principles are subject to the investigatory and enforcement powers of the Federal Trade Commission.

@ -0,0 +1,75 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class
*.pyc
# C extensions
*.so
# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec
# Installer logs
pip-log.txt
pip-delete-this-directory.txt
# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
.coveralls.yml
*.cover
.hypothesis/
# Sphinx documentation
docs/_build/
*.dev*
*.nja
build
dist
# Environments
.env
.venv
env/
venv/
ENV/
# Flymake
*_flymake.py
# Pattern specific ignore pattern
pattern/web/cache/tmp/
web/cache/tmp/
pattern_unittest_db
test/pattern_unittest_db
.DS_Store

@ -0,0 +1,249 @@
[MASTER]
# Specify a configuration file.
#rcfile=
# Python code to execute, usually for sys.path manipulation such as
# pygtk.require().
#init-hook=
# Profiled execution.
profile=no
# Add files or directories to the blacklist. They should be base names, not
# paths.
ignore=CVS, feed, json, pdf, soup, pywordnet, svm
# Pickle collected data for later comparisons.
persistent=yes
# List of plugins (as comma separated values of python modules names) to load,
# usually to register additional checkers.
load-plugins=
[MESSAGES CONTROL]
# Enable the message, report, category or checker with the given id(s). You can
# either give multiple identifier separated by comma (,) or put this option
# multiple time.
#enable=
# Disable the message, report, category or checker with the given id(s). You
# can either give multiple identifier separated by comma (,) or put this option
# multiple time (only on the command line, not in the configuration file where
# it should appear only once).
disable=C0103,W0142,E1103
[REPORTS]
# Set the output format. Available formats are text, parseable, colorized, msvs
# (visual studio) and html
output-format=text
# Include message's id in output
include-ids=yes
# Put messages in a separate file for each module / package specified on the
# command line instead of printing them on stdout. Reports (if any) will be
# written in a file name "pylint_global.[txt|html]".
files-output=no
# Tells whether to display a full report or only the messages
reports=yes
# Python expression which should return a note less than 10 (10 is the highest
# note). You have access to the variables errors warning, statement which
# respectively contain the number of errors / warnings messages and the total
# number of statements analyzed. This is used by the global evaluation report
# (RP0004).
evaluation=10.0 - ((float(5 * error + warning + refactor + convention) / statement) * 10)
# Add a comment according to your evaluation note. This is used by the global
# evaluation report (RP0004).
comment=no
[BASIC]
# Required attributes for module, separated by a comma
required-attributes=
# List of builtins function names that should not be used, separated by a comma
bad-functions=map,filter,apply,input
# Regular expression which should only match correct module names
module-rgx=(([a-z_][a-z0-9_]*)|([A-Z][a-zA-Z0-9]+))$
# Regular expression which should only match correct module level names
const-rgx=(([A-Z_][A-Z0-9_]*)|(__.*__))$
# Regular expression which should only match correct class names
class-rgx=[A-Z_][a-zA-Z0-9]+$
# Regular expression which should only match correct function names
function-rgx=[a-z_][a-z0-9_]{2,30}$
# Regular expression which should only match correct method names
method-rgx=[a-z_][a-z0-9_]{2,30}$
# Regular expression which should only match correct instance attribute names
attr-rgx=[a-z_][a-z0-9_]{2,30}$
# Regular expression which should only match correct argument names
argument-rgx=[a-z_][a-z0-9_]{2,30}$
# Regular expression which should only match correct variable names
variable-rgx=[a-z_][a-z0-9_]{2,30}$
# Regular expression which should only match correct list comprehension /
# generator expression variable names
inlinevar-rgx=[A-Za-z_][A-Za-z0-9_]*$
# Good variable names which should always be accepted, separated by a comma
good-names=i,j,k,ex,Run,_
# Bad variable names which should always be refused, separated by a comma
bad-names=foo,bar,baz,toto,tutu,tata
# Regular expression which should only match functions or classes name which do
# not require a docstring
no-docstring-rgx=__.*__
[FORMAT]
# Maximum number of characters on a single line.
max-line-length=100
# Maximum number of lines in a module
max-module-lines=1000
# String used as indentation unit. This is usually " " (4 spaces) or "\t" (1
# tab).
indent-string=' '
[MISCELLANEOUS]
# List of note tags to take in consideration, separated by a comma.
notes=FIXME,XXX,TODO
[SIMILARITIES]
# Minimum lines number of a similarity.
min-similarity-lines=4
# Ignore comments when computing similarities.
ignore-comments=yes
# Ignore docstrings when computing similarities.
ignore-docstrings=yes
[TYPECHECK]
# Tells whether missing members accessed in mixin class should be ignored. A
# mixin class is detected if its name ends with "mixin" (case insensitive).
ignore-mixin-members=yes
# List of classes names for which member attributes should not be checked
# (useful for classes with attributes dynamically set).
ignored-classes=SQLObject
# When zope mode is activated, add a predefined set of Zope acquired attributes
# to generated-members.
zope=no
# List of members which are set dynamically and missed by pylint inference
# system, and so shouldn't trigger E0201 when accessed. Python regular
# expressions are accepted.
generated-members=REQUEST,acl_users,aq_parent
[VARIABLES]
# Tells whether we should check for unused import in __init__ files.
init-import=no
# A regular expression matching the beginning of the name of dummy variables
# (i.e. not used).
dummy-variables-rgx=_|dummy
# List of additional names supposed to be defined in builtins. Remember that
# you should avoid to define new builtins when possible.
additional-builtins=
[CLASSES]
# List of interface methods to ignore, separated by a comma. This is used for
# instance to not check methods defines in Zope's Interface base class.
ignore-iface-methods=isImplementedBy,deferred,extends,names,namesAndDescriptions,queryDescriptionFor,getBases,getDescriptionFor,getDoc,getName,getTaggedValue,getTaggedValueTags,isEqualOrExtendedBy,setTaggedValue,isImplementedByInstancesOf,adaptWith,is_implemented_by
# List of method names used to declare (i.e. assign) instance attributes.
defining-attr-methods=__init__,__new__,setUp
# List of valid names for the first argument in a class method.
valid-classmethod-first-arg=cls
[DESIGN]
# Maximum number of arguments for function / method
max-args=5
# Argument names that match this expression will be ignored. Default to name
# with leading underscore
ignored-argument-names=_.*
# Maximum number of locals for function / method body
max-locals=15
# Maximum number of return / yield for function / method body
max-returns=6
# Maximum number of branch for function / method body
max-branchs=12
# Maximum number of statements in function / method body
max-statements=50
# Maximum number of parents for a class (see R0901).
max-parents=7
# Maximum number of attributes for a class (see R0902).
max-attributes=7
# Minimum number of public methods for a class (see R0903).
min-public-methods=2
# Maximum number of public methods for a class (see R0904).
max-public-methods=20
[IMPORTS]
# Deprecated modules which should not be used, separated by a comma
deprecated-modules=regsub,string,TERMIOS,Bastion,rexec
# Create a graph of every (i.e. internal and external) dependencies in the
# given file (report RP0402 must not be disabled)
import-graph=
# Create a graph of external dependencies in the given file (report RP0402 must
# not be disabled)
ext-import-graph=
# Create a graph of internal dependencies in the given file (report RP0402 must
# not be disabled)
int-import-graph=
[EXCEPTIONS]
# Exceptions that will emit a warning when being caught. Defaults to
# "Exception"
overgeneral-exceptions=Exception

@ -0,0 +1,44 @@
language: python
dist: precise
python:
- "3.6"
before_install:
- export TZ=Europe/Brussels
- if [ ${TRAVIS_PYTHON_VERSION:0:1} == "2" ]; then wget http://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh -O miniconda.sh; else wget http://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh; fi
- bash miniconda.sh -b -p $HOME/miniconda
- export PATH="$HOME/miniconda/bin:$PATH"
- conda update --yes conda
- conda install --yes numpy scipy
- pip install --quiet pytest pytest-cov pytest-xdist chardet
install:
- python setup.py install --quiet
- pip freeze
# Install and compile libsvm and liblinear
- sudo apt-get install -y build-essential
- git clone https://github.com/cjlin1/libsvm
- cd libsvm; make lib; sudo cp libsvm.so.2 /lib; sudo ln -s /lib/libsvm.so.2 /lib/libsvm.so; cd ..
- git clone https://github.com/cjlin1/liblinear
- cd liblinear; make lib; sudo cp liblinear.so.3 /lib; sudo ln -s /lib/liblinear.so.3 /lib/liblinear.so; cd ..
script:
- pytest --cov=pattern
after_script:
- pip install --quiet coveralls
- coveralls
branches:
only:
- development
notifications:
email: false
# You can connect to MySQL/MariaDB using the username "travis" or "root" and a blank password.
services:
- mysql

@ -0,0 +1,29 @@
Copyright (c) 2011-2013 University of Antwerp, Belgium
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of Pattern nor the names of its
contributors may be used to endorse or promote products
derived from this software without specific prior written
permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.

@ -0,0 +1,160 @@
Pattern
=======
[![Build Status](http://img.shields.io/travis/clips/pattern/master.svg?style=flat)](https://travis-ci.org/clips/pattern/branches)
[![Coverage](https://img.shields.io/coveralls/clips/pattern/master.svg?style=flat)](https://coveralls.io/github/clips/pattern?branch=master)
[![PyPi version](http://img.shields.io/pypi/v/pattern.svg?style=flat)](https://pypi.python.org/pypi/pattern)
[![License](https://img.shields.io/badge/License-BSD%203--Clause-green.svg?style=flat)](https://github.com/clips/pattern/blob/master/LICENSE.txt)
Pattern is a web mining module for Python. It has tools for:
* Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM parser
* Natural Language Processing: part-of-speech taggers, n-gram search, sentiment analysis, WordNet
* Machine Learning: vector space model, clustering, classification (KNN, SVM, Perceptron)
* Network Analysis: graph centrality and visualization.
It is well documented, thoroughly tested with 350+ unit tests and comes bundled with 50+ examples. The source code is licensed under BSD and available from <http://www.clips.ua.ac.be/pages/pattern>.
![Example workflow](https://raw.githubusercontent.com/clips/pattern/master/docs/g/pattern_schema.gif)
Example
-------
This example trains a classifier on adjectives mined from Twitter using Python 3. First, tweets that contain hashtag #win or #fail are collected. For example: *"$20 tip off a sweet little old lady today #win"*. The word part-of-speech tags are then parsed, keeping only adjectives. Each tweet is transformed to a vector, a dictionary of adjective → count items, labeled `WIN` or `FAIL`. The classifier uses the vectors to learn which other tweets look more like `WIN` or more like `FAIL`.
```python
from pattern.web import Twitter
from pattern.en import tag
from pattern.vector import KNN, count
twitter, knn = Twitter(), KNN()
for i in range(1, 3):
for tweet in twitter.search('#win OR #fail', start=i, count=100):
s = tweet.text.lower()
p = '#win' in s and 'WIN' or 'FAIL'
v = tag(s)
v = [word for word, pos in v if pos == 'JJ'] # JJ = adjective
v = count(v) # {'sweet': 1}
if v:
knn.train(v, type=p)
print(knn.classify('sweet potato burger'))
print(knn.classify('stupid autocorrect'))
```
Installation
------------
Pattern supports Python 2.7 and Python 3.6. To install Pattern so that it is available in all your scripts, unzip the download and from the command line do:
```bash
cd pattern-3.6
python setup.py install
```
If you have pip, you can automatically download and install from the [PyPI repository](https://pypi.python.org/pypi/Pattern):
```bash
pip install pattern
```
If none of the above works, you can make Python aware of the module in three ways:
- Put the pattern folder in the same folder as your script.
- Put the pattern folder in the standard location for modules so it is available to all scripts:
* `c:\python36\Lib\site-packages\` (Windows),
* `/Library/Python/3.6/site-packages/` (Mac OS X),
* `/usr/lib/python3.6/site-packages/` (Unix).
- Add the location of the module to `sys.path` in your script, before importing it:
```python
MODULE = '/users/tom/desktop/pattern'
import sys; if MODULE not in sys.path: sys.path.append(MODULE)
from pattern.en import parsetree
```
Documentation
-------------
For documentation and examples see the [user documentation](http://www.clips.ua.ac.be/pages/pattern). If you are a developer, go check out the [developer documentation](http://www.clips.ua.ac.be/pages/pattern-dev).
Version
-------
3.6
License
-------
**BSD**, see `LICENSE.txt` for further details.
Reference
---------
De Smedt, T., Daelemans, W. (2012). Pattern for Python. *Journal of Machine Learning Research, 13*, 20312035.
Contribute
----------
The source code is hosted on GitHub and contributions or donations are welcomed. Please have look at the [developer documentation](http://www.clips.ua.ac.be/pages/pattern-dev). If you use Pattern in your work, please cite our reference paper.
Bundled dependencies
--------------------
Pattern is bundled with the following data sets, algorithms and Python packages:
- **Brill tagger**, Eric Brill
- **Brill tagger for Dutch**, Jeroen Geertzen
- **Brill tagger for German**, Gerold Schneider & Martin Volk
- **Brill tagger for Spanish**, trained on Wikicorpus (Samuel Reese & Gemma Boleda et al.)
- **Brill tagger for French**, trained on Lefff (Benoît Sagot & Lionel Clément et al.)
- **Brill tagger for Italian**, mined from Wiktionary
- **English pluralization**, Damian Conway
- **Spanish verb inflection**, Fred Jehle
- **French verb inflection**, Bob Salita
- **Graph JavaScript framework**, Aslak Hellesoy & Dave Hoover
- **LIBSVM**, Chih-Chung Chang & Chih-Jen Lin
- **LIBLINEAR**, Rong-En Fan et al.
- **NetworkX centrality**, Aric Hagberg, Dan Schult & Pieter Swart
- **spelling corrector**, Peter Norvig
Acknowledgements
----------------
**Authors:**
- Tom De Smedt (tom@organisms.be)
- Walter Daelemans (walter.daelemans@ua.ac.be)
**Contributors (chronological):**
- Frederik De Bleser
- Jason Wiener
- Daniel Friesen
- Jeroen Geertzen
- Thomas Crombez
- Ken Williams
- Peteris Erins
- Rajesh Nair
- F. De Smedt
- Radim Řehůřek
- Tom Loredo
- John DeBovis
- Thomas Sileo
- Gerold Schneider
- Martin Volk
- Samuel Joseph
- Shubhanshu Mishra
- Robert Elwell
- Fred Jehle
- Antoine Mazières + fabelier.org
- Rémi de Zoeten + closealert.nl
- Kenneth Koch
- Jens Grivolla
- Fabio Marfia
- Steven Loria
- Colin Molter + tevizz.com
- Peter Bull
- Maurizio Sambati
- Dan Fu
- Salvatore Di Dio
- Vincent Van Asch
- Frederik Elwert

File diff suppressed because it is too large Load Diff

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.3 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 280 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.2 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 187 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 88 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 108 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.3 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 8.5 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 24 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 42 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 36 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.6 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.3 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.2 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.5 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.0 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.5 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 8.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 10 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 14 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 8.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 48 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 68 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.6 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 44 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 24 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 19 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 21 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.1 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 24 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 429 B

@ -0,0 +1,474 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<title>mbsp-tags</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link type="text/css" rel="stylesheet" href="../clips.css" />
<style>
/* Small fixes because we omit the online layout.css. */
h3 { line-height: 1.3em; }
#page { margin-left: auto; margin-right: auto; }
#header, #header-inner { height: 175px; }
#header { border-bottom: 1px solid #C6D4DD; }
table { border-collapse: collapse; }
#checksum { display: none; }
</style>
<link href="../js/shCore.css" rel="stylesheet" type="text/css" />
<link href="../js/shThemeDefault.css" rel="stylesheet" type="text/css" />
<script language="javascript" src="../js/shCore.js"></script>
<script language="javascript" src="../js/shBrushXml.js"></script>
<script language="javascript" src="../js/shBrushJScript.js"></script>
<script language="javascript" src="../js/shBrushPython.js"></script>
</head>
<body class="node-type-page one-sidebar sidebar-right section-pages">
<div id="page">
<div id="page-inner">
<div id="header"><div id="header-inner"></div></div>
<div id="content">
<div id="content-inner">
<div class="node node-type-page"
<div class="node-inner">
<div class="breadcrumb">View online at: <a href="http://www.clips.ua.ac.be/pages/mbsp-tags" class="noexternal" target="_blank">http://www.clips.ua.ac.be/pages/mbsp-tags</a></div>
<h1>Penn Treebank II tag set</h1>
<!-- Parsed from the online documentation. -->
<div id="node-1274" class="node node-type-page"><div class="node-inner">
<div class="content">
<p class="big"><a href="pattern.html">Pattern</a> and&nbsp;<a href="http://www.clips.ua.ac.be/pages/MBSP" target="_self">MBSP</a> assign meaningful tags to words and groups of words in a sentence. Each tag is a short code (such as "<span class="postag">DT</span>" for "determiner").</p>
<p>The tag set is based on the Penn Treebank Tagging Guidelines [<a href="ftp://ftp.cis.upenn.edu/pub/treebank/doc/tagguide.ps.gz" target="_self">pdf</a>].</p>
<h3>Part-of-speech tags</h3>
<p>Part-of-speech tags are assigned to a single word according to its role in the sentence. Traditional grammar classifies words based on eight parts of speech: the verb (<span class="postag">VB</span>), the noun (<span class="postag">NN</span>), the pronoun (<span class="postag">PR</span>+<span class="postag">DT</span>), the adjective (<span class="postag">JJ</span>), the adverb (<span class="postag">RB</span>), the preposition (<span class="postag">IN</span>), the conjunction (<span class="postag">CC</span>), and the interjection (<span class="postag">UH</span>).</p>
<table class="border">
<tbody>
<tr>
<td><span class="smallcaps">Tag </span></td>
<td><span class="smallcaps">Description </span></td>
<td class="smallcaps">Example</td>
</tr>
<tr>
<td><span class="postag">CC </span></td>
<td>conjunction, coordinating</td>
<td><em>and, or, but</em></td>
</tr>
<tr>
<td><span class="postag">CD </span></td>
<td>cardinal number</td>
<td><em>five, three, 13%</em></td>
</tr>
<tr>
<td><span class="postag">DT </span></td>
<td>determiner</td>
<td><em>the, a, these <br /></em></td>
</tr>
<tr>
<td><span class="postag">EX </span></td>
<td>existential there</td>
<td><em><span style="text-decoration: underline;">there</span> were six boys <br /></em></td>
</tr>
<tr>
<td><span class="postag">FW </span></td>
<td>foreign word</td>
<td><em>mais <br /></em></td>
</tr>
<tr>
<td><span class="postag">IN </span></td>
<td>conjunction, subordinating or preposition</td>
<td><em>of, on, before, unless <br /></em></td>
</tr>
<tr>
<td><span class="postag">JJ </span></td>
<td>adjective</td>
<td><em>nice, easy </em></td>
</tr>
<tr>
<td><span class="postag">JJR </span></td>
<td>adjective, comparative</td>
<td><em>nicer, easier</em></td>
</tr>
<tr>
<td><span class="postag">JJS </span></td>
<td>adjective, superlative</td>
<td><em>nicest, easiest <br /></em></td>
</tr>
<tr>
<td><span class="postag">LS </span></td>
<td>list item marker</td>
<td><em>&nbsp;</em></td>
</tr>
<tr>
<td><span class="postag">MD </span></td>
<td>verb, modal auxillary</td>
<td><em>may, should <br /></em></td>
</tr>
<tr>
<td><span class="postag">NN </span></td>
<td>noun, singular or mass</td>
<td><em>tiger, chair, laughter <br /></em></td>
</tr>
<tr>
<td><span class="postag">NNS </span></td>
<td>noun, plural</td>
<td><em>tigers, chairs, insects <br /></em></td>
</tr>
<tr>
<td><span class="postag">NNP </span></td>
<td>noun, proper singular</td>
<td><em>Germany, God, Alice <br /></em></td>
</tr>
<tr>
<td><span class="postag">NNPS </span></td>
<td>noun, proper plural</td>
<td><em>we met two <span style="text-decoration: underline;">Christmases</span> ago <br /></em></td>
</tr>
<tr>
<td><span class="postag">PDT </span></td>
<td>predeterminer</td>
<td><em><span style="text-decoration: underline;">both</span> his children <br /></em></td>
</tr>
<tr>
<td><span class="postag">POS</span></td>
<td>possessive ending</td>
<td><em>'s</em></td>
</tr>
<tr>
<td><span class="postag">PRP </span></td>
<td>pronoun, personal</td>
<td><em>me, you, it <br /></em></td>
</tr>
<tr>
<td><span class="postag">PRP$ </span></td>
<td>pronoun, possessive</td>
<td><em>my, your, our <br /></em></td>
</tr>
<tr>
<td><span class="postag">RB </span></td>
<td>adverb</td>
<td><em>extremely, loudly, hard&nbsp; <br /></em></td>
</tr>
<tr>
<td><span class="postag">RBR </span></td>
<td>adverb, comparative</td>
<td><em>better <br /></em></td>
</tr>
<tr>
<td><span class="postag">RBS </span></td>
<td>adverb, superlative</td>
<td><em>best <br /></em></td>
</tr>
<tr>
<td><span class="postag">RP </span></td>
<td>adverb, particle</td>
<td><em>about, off, up <br /></em></td>
</tr>
<tr>
<td><span class="postag">SYM </span></td>
<td>symbol</td>
<td><em>% <br /></em></td>
</tr>
<tr>
<td><span class="postag">TO </span></td>
<td>infinitival to</td>
<td><em>what <span style="text-decoration: underline;">to</span> do? <br /></em></td>
</tr>
<tr>
<td><span class="postag">UH </span></td>
<td>interjection</td>
<td><em>oh, oops, gosh <br /></em></td>
</tr>
<tr>
<td><span class="postag">VB </span></td>
<td>verb, base form</td>
<td><em>think <br /></em></td>
</tr>
<tr>
<td><span class="postag">VBZ </span></td>
<td>verb, 3rd person singular present</td>
<td><em>she <span style="text-decoration: underline;">thinks </span><br /></em></td>
</tr>
<tr>
<td><span class="postag">VBP </span></td>
<td>verb, non-3rd person singular present</td>
<td><em>I <span style="text-decoration: underline;">think </span><br /></em></td>
</tr>
<tr>
<td><span class="postag">VBD </span></td>
<td>verb, past tense</td>
<td><em>they <span style="text-decoration: underline;">thought </span><br /></em></td>
</tr>
<tr>
<td><span class="postag">VBN </span></td>
<td>verb, past participle</td>
<td><em>a <span style="text-decoration: underline;">sunken</span> ship <br /></em></td>
</tr>
<tr>
<td><span class="postag">VBG </span></td>
<td>verb, gerund or present participle</td>
<td><em><span style="text-decoration: underline;">thinking</span> is fun <br /></em></td>
</tr>
<tr>
<td><span class="postag">WDT </span></td>
<td><em>wh</em>-determiner</td>
<td><em>which, whatever, whichever <br /></em></td>
</tr>
<tr>
<td><span class="postag">WP </span></td>
<td><em>wh</em>-pronoun, personal</td>
<td><em>what, who, whom <br /></em></td>
</tr>
<tr>
<td><span class="postag">WP$</span></td>
<td><em>wh</em>-pronoun, possessive</td>
<td><em>whose, whosever <br /></em></td>
</tr>
<tr>
<td><span class="postag">WRB</span></td>
<td><em>wh</em>-adverb</td>
<td><em>where, when <br /></em></td>
</tr>
<tr>
<td><span class="postag">. </span></td>
<td>punctuation mark, sentence closer</td>
<td><em>.;?* <br /></em></td>
</tr>
<tr>
<td><span class="postag">, </span></td>
<td>punctuation mark, comma</td>
<td><em>, <br /></em></td>
</tr>
<tr>
<td><span class="postag">: </span></td>
<td>punctuation mark, colon</td>
<td><em>: <br /></em></td>
</tr>
<tr>
<td><span class="postag">( </span></td>
<td>contextual separator, left paren</td>
<td><em>( <br /></em></td>
</tr>
<tr>
<td><span class="postag">) </span></td>
<td>contextual separator, right paren</td>
<td><em>) <br /></em></td>
</tr>
</tbody>
</table>
<h3>Chunk tags</h3>
<p>Chunk tags are assigned to groups of words that belong together (i.e. phrases). The most common phrases are the noun phrase (<span class="postag">NP</span>, for example <em>the black cat</em>) and the verb phrase (<span class="postag">VP</span>, for example <em>is purring</em>).</p>
<table class="border">
<tbody>
<tr>
<td><span class="smallcaps">Tag </span></td>
<td><span class="smallcaps">Description </span></td>
<td><span class="smallcaps">Words </span></td>
<td><span class="smallcaps">Example </span></td>
<td align="right">%</td>
</tr>
<tr>
<td><span class="postag">NP </span></td>
<td>noun phrase<span class="postag">&nbsp;</span></td>
<td><span class="postag">DT</span>+<span class="postag">RB</span>+<span class="postag">JJ</span>+<span class="postag">NN</span> + <span class="postag">PR</span></td>
<td><em>the strange bird</em></td>
<td align="right">&nbsp;51</td>
</tr>
<tr>
<td><span class="postag">PP </span></td>
<td>prepositional phrase</td>
<td><span class="postag">TO</span>+<span class="postag">IN </span></td>
<td><em>in between</em></td>
<td align="right">&nbsp;19</td>
</tr>
<tr>
<td><span class="postag">VP&nbsp; </span></td>
<td>verb phrase&nbsp;</td>
<td><span class="postag">RB</span>+<span class="postag">MD</span>+<span class="postag">VB&nbsp; </span></td>
<td><em>was looking<br /></em></td>
<td align="right">9</td>
</tr>
<tr>
<td><span class="postag">ADVP</span></td>
<td>adverb phrase</td>
<td><span class="postag">RB</span></td>
<td><em>also<br /></em></td>
<td align="right">&nbsp;6</td>
</tr>
<tr>
<td><span class="postag">ADJP</span></td>
<td>adjective phrase<span class="postag">&nbsp;</span></td>
<td><span class="postag">CC</span>+<span class="postag">RB</span>+<span class="postag">JJ</span></td>
<td><em>warm and cosy</em></td>
<td align="right">&nbsp;3</td>
</tr>
<tr>
<td><span class="postag">SBAR</span></td>
<td>subordinating conjunction&nbsp;</td>
<td><span class="postag">IN</span></td>
<td><em><span style="text-decoration: underline;">whether</span> or not<br /></em></td>
<td align="right">3</td>
</tr>
<tr>
<td><span class="postag">PRT </span></td>
<td>particle</td>
<td><span class="postag">RP</span></td>
<td><em><span style="text-decoration: underline;">up</span> the stairs</em></td>
<td align="right">&nbsp;1</td>
</tr>
<tr>
<td><span class="postag">INTJ</span></td>
<td>interjection</td>
<td><span class="postag">UH</span></td>
<td><em>hello</em><em><br /></em></td>
<td align="right">&nbsp;0</td>
</tr>
</tbody>
</table>
<p>The IOB prefix marks whether a word is inside or outside of a chunk.</p>
<table class="border">
<tbody>
<tr>
<td><span class="smallcaps">Tag </span></td>
<td><span class="smallcaps">Description </span></td>
</tr>
<tr>
<td><span class="postag">I-</span></td>
<td>inside the chunk</td>
</tr>
<tr>
<td><span class="postag">B-</span></td>
<td>inside the chunk, preceding word is part of a different chunk</td>
</tr>
<tr>
<td><span class="postag">O </span></td>
<td>not part of a chunk</td>
</tr>
</tbody>
</table>
<p>A prepositional noun phrase (<span class="postag">PNP</span>) is a group of chunks starting with a preposition (<span class="postag">PP</span>) followed by noun phrases (<span class="postag">NP</span>), for example: <em>under the table</em>.</p>
<table class="border">
<tbody>
<tr>
<td><span class="smallcaps">Tag </span></td>
<td><span class="smallcaps">Description </span></td>
<td class="smallcaps">Chunks</td>
<td><span class="smallcaps">Example </span></td>
</tr>
<tr>
<td><span class="postag">PNP</span></td>
<td>prepositional noun phrase</td>
<td><span class="postag">PP</span>+<span class="postag">NP</span><span class="postag"> </span></td>
<td><em>as of today</em></td>
</tr>
</tbody>
</table>
<h3>Relation tags</h3>
<p>Relations tags describe the relation between different chunks, and clarify the role of a chunk in that relation. The most common roles in a sentence are <span class="postag">SBJ</span> (subject noun phrase) and <span class="postag">OBJ</span> (object noun phrase). They link <span class="postag">NP</span> to <span class="postag">VP</span> chunks. The subject of a sentence is the person, thing, place or idea that is <em>doing</em> or <em>being</em> something. The object of a sentence is the person/thing affected by the action.</p>
<table class="border">
<tbody>
<tr>
<td><span class="smallcaps">Tag </span></td>
<td><span class="smallcaps">Description </span></td>
<td class="smallcaps">Chunks</td>
<td><span class="smallcaps">Example </span></td>
<td align="right"><span class="smallcaps">%</span></td>
</tr>
<tr>
<td><span class="postag">-SBJ</span></td>
<td>sentence subject</td>
<td><span class="postag">NP</span><span class="postag"> </span></td>
<td><em><span style="text-decoration: underline;">the cat</span> sat on the mat<br /></em></td>
<td align="right">35</td>
</tr>
<tr>
<td><span class="postag">-OBJ</span></td>
<td>sentence object</td>
<td><span class="postag">NP</span>+<span class="postag">SBAR</span></td>
<td><em>the cat grabs <span style="text-decoration: underline;">the fish</span><br /></em></td>
<td align="right">27</td>
</tr>
<tr>
<td><span class="postag">-PRD </span></td>
<td>predicate</td>
<td><span class="postag">PP</span>+<span class="postag">NP</span>+<span class="postag">ADJP </span></td>
<td><em>the cat feels <span style="text-decoration: underline;">warm and fuzzy</span><br /></em></td>
<td align="right">7</td>
</tr>
<tr>
<td><span class="postag">-TMP</span></td>
<td>temporal&nbsp;</td>
<td><span class="postag">PP</span>+<span class="postag">NP</span>+<span class="postag">ADVP</span></td>
<td><em>arrive </em><em><span style="text-decoration: underline;">at noon</span> <br /></em></td>
<td align="right">7</td>
</tr>
<tr>
<td><span class="postag">-CLR </span></td>
<td>closely related</td>
<td><span class="postag">PP</span>+<span class="postag">NP</span>+<span class="postag">ADVP </span></td>
<td><em>work </em><em><span style="text-decoration: underline;">as a researcher</span> <br /></em></td>
<td align="right">6</td>
</tr>
<tr>
<td><span class="postag">-LOC</span></td>
<td>location&nbsp;</td>
<td><span class="postag">PP&nbsp; </span></td>
<td><em>live </em><em><span style="text-decoration: underline;">in Belgium</span> <br /></em></td>
<td align="right">4</td>
</tr>
<tr>
<td><span class="postag">-DIR&nbsp; </span></td>
<td>direction</td>
<td><span class="postag">PP </span></td>
<td><em>walk</em><em> <span style="text-decoration: underline;">towards</span> the door<br /></em></td>
<td align="right">3</td>
</tr>
<tr>
<td><span class="postag">-EXT</span></td>
<td>extent</td>
<td><span class="postag">PP</span>+<span class="postag">NP </span></td>
<td><em>drop <span style="text-decoration: underline;">10 %</span><br /></em></td>
<td align="right">1</td>
</tr>
<tr>
<td><span class="postag">-PRP</span></td>
<td>purpose</td>
<td><span class="postag">PP</span>+<span class="postag">SBAR </span></td>
<td><em>die <span style="text-decoration: underline;">as a result</span> of <br /></em></td>
<td align="right">1</td>
</tr>
</tbody>
</table>
<h3>Anchor tags</h3>
<p>Anchor tags describe how prepositional noun phrases (<span class="postag">PNP</span>) are attached to other chunks in the sentence. For example, in the sentence, <em>I eat pizza with a fork</em>, the anchor of <em>with a fork</em> is <em>eat</em> because it answers the question: "In what way do I eat?"</p>
<table class="border">
<tbody>
<tr>
<td><span class="smallcaps">Tag </span></td>
<td><span class="smallcaps">Description </span></td>
<td><span class="smallcaps">Example </span></td>
</tr>
<tr>
<td><span class="postag">A1</span></td>
<td>anchor chunks that corresponds to <span class="postag">P1</span></td>
<td><em><span style="text-decoration: underline;">eat</span> with a fork<br /></em></td>
</tr>
<tr>
<td><span class="postag">P1 </span></td>
<td><span class="postag">PNP</span> that corresponds to <span class="postag">A1 </span></td>
<td><em>eat <span style="text-decoration: underline;">with a fork</span><br /></em></td>
</tr>
</tbody>
</table>
<p>&nbsp;</p>
<p><strong>Occurence estimate </strong><span class="small"><br /></span></p>
<p><span class="small">The given percentages for chunk and relations tags are based on tenfold cross validation on sections 10 to 19 of the WSJ Corpus of the Penn Treebank II by Sabine Buchholz, from which we derived a rough indication. The estimate means that if a 100 chunk tags are found, about 50 would be <span class="postag">NP</span> tags and 35 would have a <span class="postag">SBJ</span> relation tag. About 30 of the chunks would be tagged as <span class="postag">NP-SBJ</span>, and 15 as <span class="postag">NP-OBJ</span>.&nbsp;</span></p>
<p><span class="small"><span style="text-decoration: underline;">Reference</span>: Buchholz, S. (2002). <em>Memory-Based Grammatical Relation Finding</em>. ILK, Tilburg University.</span></p>
</div>
</div></div>
</div>
</div>
</div>
</div>
</div>
</div>
<script>
SyntaxHighlighter.all();
</script>
</body>
</html>

File diff suppressed because one or more lines are too long

@ -0,0 +1,700 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<title>pattern-db</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link type="text/css" rel="stylesheet" href="../clips.css" />
<style>
/* Small fixes because we omit the online layout.css. */
h3 { line-height: 1.3em; }
#page { margin-left: auto; margin-right: auto; }
#header, #header-inner { height: 175px; }
#header { border-bottom: 1px solid #C6D4DD; }
table { border-collapse: collapse; }
#checksum { display: none; }
</style>
<link href="../js/shCore.css" rel="stylesheet" type="text/css" />
<link href="../js/shThemeDefault.css" rel="stylesheet" type="text/css" />
<script language="javascript" src="../js/shCore.js"></script>
<script language="javascript" src="../js/shBrushXml.js"></script>
<script language="javascript" src="../js/shBrushJScript.js"></script>
<script language="javascript" src="../js/shBrushPython.js"></script>
</head>
<body class="node-type-page one-sidebar sidebar-right section-pages">
<div id="page">
<div id="page-inner">
<div id="header"><div id="header-inner"></div></div>
<div id="content">
<div id="content-inner">
<div class="node node-type-page"
<div class="node-inner">
<div class="breadcrumb">View online at: <a href="http://www.clips.ua.ac.be/pages/pattern-db" class="noexternal" target="_blank">http://www.clips.ua.ac.be/pages/pattern-db</a></div>
<h1>pattern.db</h1>
<!-- Parsed from the online documentation. -->
<div id="node-1432" class="node node-type-page"><div class="node-inner">
<div class="content">
<p class="big">The pattern.db module contains wrappers for databases (SQLite, MySQL), Unicode CSV files and Python's datetime. It offers a convenient way to work with tabular data, for example retrieved with the pattern.web module.</p>
<p>It can be used by itself or with other <a href="pattern.html">pattern</a> modules: <a href="pattern-web.html">web</a> | db | <a href="pattern-en.html">en</a> | <a href="pattern-search.html">search</a> <span class="blue"></span> | <a href="pattern-vector.html">vector</a> | <a href="pattern-graph.html">graph</a>.</p>
<p><img src="../g/pattern_schema.gif" alt="" width="620" height="180" /></p>
<hr />
<h2>Documentation</h2>
<ul style="margin-top: 0;">
<li><a href="#database">Database</a> <span class="smallcaps link-maintenance">(sqlite + mysql)</span></li>
<li><a href="#table">Table</a></li>
<li><a href="#query">Query</a></li>
<li><a href="#datasheet">Datasheet</a> <span class="smallcaps link-maintenance">(<a href="#csv">csv</a>)</span></li>
<li><a href="#date">Date</a></li>
</ul>
<p>&nbsp;</p>
<hr />
<h2><a name="database"></a>Database</h2>
<p>A database is a collection of tables. A table has rows of data with a specific data type (e.g., string, float) for each field or column. A database engine provides an interface to the database, using <a href="https://en.wikipedia.org/wiki/SQL" target="_blank">SQL</a> statements (Structured Query Language). Python 2.5+ comes bundled with the SQLite engine. The <a href="http://www.mysql.com/" target="_blank">MySQL</a> engine requires the <a href="http://sourceforge.net/projects/mysql-python/" target="_blank">MySQL-Python</a> bindings. Note that a 32-bit Python requires a 32-bit MySQL.</p>
<p>The <span class="inline_code">Database()</span> constructor creates (if necessary) and returns an <span class="inline_code">SQLITE</span> or <span class="inline_code">MYSQL</span> database. With <span class="inline_code">SQLITE</span>, it will create a file with the given name in the current folder.</p>
<pre class="brush:python; gutter:false; light:true;">db = Database(
name,
host = 'localhost',
port = 3306,
username = 'root',
password = '',
type = SQLITE
)
</pre><pre class="brush:python; gutter:false; light:true;">db.type # SQLITE | MYSQL
db.name # Database name.
db.host # Database host (MySQL).
db.port # Database port (MySQL).
db.username # Database username (MySQL).
db.password # Database password (MySQL).
db.tables # Dictionary of (name, Table)-items.
db.relations # List of relations, see Database.link().
db.query # Last executed SQL query.
db.connected # True after Database.connect(). </pre><pre class="brush:python; gutter:false; light:true;">db.connect() # Happens automatically.
db.disconnect()</pre><pre class="brush:python; gutter:false; light:true;">db.create(table, fields=[])
db.remove(table)
db.link(table1, field1, table2, field2, join=LEFT) </pre><pre class="brush:python; gutter:false; light:true;">db.execute(SQL, commit=False)
db.commit()
db.escape(value) # "a cat's tail" =&gt; "'a cat\'s tail'"</pre><ul>
<li><span class="inline_code">Database.execute()</span> returns an iterator of rows for the given SQL query.</li>
<li><span class="inline_code">Database.commit()</span> commits the changes of pending <span class="inline_code">INSERT</span>, <span class="inline_code">UPDATE</span>, <span class="inline_code">DELETE</span> queries.</li>
<li><span class="inline_code">Database.escape()</span> safely quotes and escapes field values.</li>
</ul>
<h3>Create table</h3>
<p><span class="inline_code">Database.create()</span> creates a new table in the database, It takes a table name and a list of row fields, where each field is defined with the <span class="inline_code">field()</span> function. Each field has a <span class="inline_code">name</span> (a-z + underscores) and a <span class="inline_code">type</span>, with an optional <span class="inline_code">default</span> value for new rows. The <span class="inline_code">pk()</span> function can be used for primary keys.</p>
<pre class="brush:python; gutter:false; light:true;">field(name, type=STRING, default=None, index=False, optional=True)</pre><pre class="brush: python;gutter: false; light: true; fontsize: 100; first-line: 1; ">pk(name='id') # field('id', INTEGER, index=PRIMARY, optional=False) </pre><table class="border">
<tbody>
<tr>
<td><span class="smallcaps">Type</span></td>
<td><span class="smallcaps">Value</span></td>
<td><span class="smallcaps">Example</span></td>
</tr>
<tr>
<td><span class="inline_code">STRING</span></td>
<td><span class="inline_code">str</span>, <span class="inline_code">unicode</span> (1-255 characters)</td>
<td><span class="inline_code">u'Schrödinger'</span></td>
</tr>
<tr>
<td><span class="inline_code">INTEGER</span></td>
<td><span class="inline_code">int</span></td>
<td><span class="inline_code">42</span></td>
</tr>
<tr>
<td><span class="inline_code">FLOAT</span></td>
<td><span class="inline_code">float</span></td>
<td><span class="inline_code">3.14159</span></td>
</tr>
<tr>
<td><span class="inline_code">TEXT</span></td>
<td><span class="inline_code">str</span>, <span class="inline_code">unicode</span></td>
<td><span class="inline_code">open('file.txt').read() </span></td>
</tr>
<tr>
<td><span class="inline_code">BLOB</span></td>
<td><span class="inline_code">str</span> (binary, e.g., PDF, PNG)</td>
<td><span class="inline_code">db.binary(open('img.jpg',</span> <span class="inline_code">'rb').read())</span></td>
</tr>
<tr>
<td><span class="inline_code">BOOLEAN</span></td>
<td><span class="inline_code">bool</span></td>
<td><span class="inline_code">True</span>, <span class="inline_code">False</span></td>
</tr>
<tr>
<td><span class="inline_code">DATE</span></td>
<td><span class="inline_code">Date</span></td>
<td><span class="inline_code">date('1999-12-31 23:59:59')</span></td>
</tr>
</tbody>
</table>
<p>A <span class="inline_code">STRING</span> field can contain up to a 100 characters. The length (1-255) can be changed by calling <span class="inline_code">STRING</span> as a function, e.g., <span class="inline_code">type=STRING(255)</span>. For longer strings, use <span class="inline_code">TEXT</span>. The default value for a <span class="inline_code">DATE</span> field is <span class="inline_code">NOW</span>.</p>
<p>With <span class="inline_code">index=True</span>, the field is indexed for faster search. The index can also be set to <span class="inline_code">UNIQUE</span> (no duplicates) or <span class="inline_code">PRIMARY</span>. A table must have a primary key field that uniquely identifies each row (i.e., an id). Integer primary keys are auto-numbered, there is no need to set the value manually in new rows.</p>
<p>With <span class="inline_code">optional=True</span>, the field is allowed to contain <span class="inline_code">None</span>.</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.db import Database, field, pk, STRING, BOOLEAN, DATE, NOW
&gt;&gt;&gt;
&gt;&gt;&gt; db = Database('my_stuff')
&gt;&gt;&gt; db.create('pets', fields=(
&gt;&gt;&gt; pk(),
&gt;&gt;&gt; field('name', STRING(80), index=True),
&gt;&gt;&gt; field('type', STRING(20)),
&gt;&gt;&gt; field('tail', BOOLEAN),
&gt;&gt;&gt; field('date_birth', DATE, default=None),
&gt;&gt;&gt; field('date_created', DATE, default=NOW)
&gt;&gt;&gt; ))</pre></div>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; db.pets.append(name=u'Schrödinger', type='cat', tail=True)
&gt;&gt;&gt; print db.pets.rows()[0]
(1, u'Schrödinger', u'cat', True, None, Date('2013-12-11 10:09:08'))</pre></div>
<h3>Create table from XML</h3>
<p><span class="inline_code">Database.create()</span> can also take a <span class="inline_code">Table.xml</span> or <span class="inline_code">Query.xml</span>. It creates a new table and copies the row data in the given XML string. An optional <span class="inline_code">name</span> parameter can be used to rename the new table. In <span class="inline_code">Query.xml</span>, a field name may contain a period. It will be replaced with an underscore (e.g., pets.name → pets_name). Alternatively, an alias can be defined in the <span class="inline_code">Query.aliases</span> dictionary.</p>
<p>&nbsp;</p>
<hr />
<h2><a name="table"></a>Table</h2>
<p>A <span class="inline_code">Table</span> is a list of rows, with one or more fields (i.e., table columns) of a certain type (i.e., string or number). A new table can be created with <span class="inline_code">Database.create()</span>. A <span class="inline_code">TableError</span> is raised if a table with the given name exists. An existing table can be retrieved with <span class="inline_code">Database.tables[name]</span>, <span class="inline_code">Database[name]</span> or <span class="inline_code">Database.&lt;name&gt;</span>.</p>
<pre class="brush:python; gutter:false; light:true;">table = Database.tables[name]</pre><pre class="brush:python; gutter:false; light:true;">table.db # Parent Database.
table.name # Table name (a-z + underscores).
table.fields # List of field names (i.e., columns).
table.schema # Dictionary of (field, Schema)-items.
table.default # Dictionary of (field, value)-items for new rows.
table.pk # Primary key field name.</pre><pre class="brush:python; gutter:false; light:true;">table.count() # Total number of rows (len(table) also works).
table.rows() # List of rows, each a tuple of fields.
</pre><pre class="brush:python; gutter:false; light:true;">table.record(row) # Dictionary of (field, value)-items for given row.</pre><pre class="brush:python; gutter:false; light:true;">table.append(fields={}, commit=True)
table.update(id, fields={}, commit=True)
table.remove(id, commit=True)
</pre><pre class="brush:python; gutter:false; light:true;">table.filter(*args, **kwargs)
table.search(*args, **kwargs) </pre><pre class="brush:python; gutter:false; light:true;">table.xml # XML string with the table schema and rows.
table.datasheet # Datasheet object (see below).</pre><ul>
<li><span class="inline_code">Table.rows()</span> returns a list of all rows. To iterate rows memory-efficiently, use <span class="inline_code">iter(</span><span class="inline_code">Table)</span>.</li>
<li><span class="inline_code">Table.append()</span>, <span class="inline_code">update()</span> and <span class="inline_code">remove()</span> modify the table contents.<br />With <span class="inline_code">commit=False</span>, changes are only committed after <span class="inline_code">Database.commit()</span> (= faster in batch).</li>
<li><span class="inline_code">Table.filter()</span> returns a subset of rows with a subset of fields.<br />For example: <span class="inline_code">table.filter('name',</span> <span class="inline_code">type='cat')</span>.</li>
</ul>
<h3>Table schema</h3>
<p>The <span class="inline_code">Table.schema</span> dictionary contains field name → <span class="inline_code">Schema</span> items.</p>
<pre class="brush:python; gutter:false; light:true;">schema = Table.schema[fieldname]</pre><pre class="brush:python; gutter:false; light:true;">schema.name # Field name.
schema.type # STRING, INTEGER, FLOAT, TEXT, BLOB, BOOLEAN, DATE
schema.length # STRING field length.
schema.default # Default value.
schema.index # PRIMARY | UNIQUE | True | False
schema.optional # True or False. </pre><div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.db import Database
&gt;&gt;&gt;
&gt;&gt;&gt; db = Database('my_stuff')
&gt;&gt;&gt;
&gt;&gt;&gt; print db.pets.fields
&gt;&gt;&gt; print db.pets.schema['name'].type
&gt;&gt;&gt; print db.pets.schema['name'].length
['id', 'name', 'tail', 'date_birth', 'date_created']
STRING
80 </pre></div>
<h3>Append row</h3>
<p><span class="inline_code">Table.append()</span> adds a new row with the given field values. It returns the row id, if the table has a primary key generated with <span class="inline_code">pk()</span>. Field values can be given as optional parameters, a dictionary or a tuple. Field values for a <span class="inline_code">BLOB</span> field must be wrapped in <span class="inline_code">Database.binary()</span>.<span style="color: #333333; font-family: Inconsolata, 'Courier New', Courier, monospace; font-size: small;"></span></p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; db.pets.append(name=u'Schrödinger', date_birth=date('2009-08-12'))</pre></div>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; db.pets.append({'name': u'Schrödinger', 'date_birth': date('2009-08-12')}) </pre></div>
<div class="example">
<pre class="brush: python;gutter: false; light: true; fontsize: 100; first-line: 1; ">&gt;&gt;&gt; db.pets.append((u'Schrödinger', 'cat', True, date('2009-08-12')) # in-order</pre></div>
<h3>Update row</h3>
<p><span class="inline_code">Table.update()</span> updates values in the row with the given primary key. A batch of rows can be updated using a <a class="link-maintenance" href="#filter">filter</a>, or a chain of filters with <span class="inline_code">any()</span> or <span class="inline_code">all()</span>. In the last example, all rows with <span class="inline_code">type='cat'</span> will have their <span class="inline_code">tail</span> field set to <span class="inline_code">True</span>.</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; db.pets.update(1, type='cat') # set type='cat' in row with id=1.</pre></div>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; db.pets.update(1, {'type': 'cat'})</pre></div>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; db.pets.update(eq('type', 'cat'), tail=True) </pre></div>
<h3>Remove row</h3>
<p><span class="inline_code">Table.remove()</span> removes the row with the given primary key:</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; db.pets.remove(1)</pre></div>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; db.pets.remove(ALL)</pre></div>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; db.pets.remove(all(eq('type', 'cat'), lt(year('date_birth'), 1990, '&lt;')))</pre></div>
<p>The last example removes all rows that have <span class="inline_code">type='cat'</span> AND year of birth before 1990.</p>
<h3><span>Filter rows</span></h3>
<p><span class="inline_code">Table.filter()</span> returns a list of rows filtered by field value(s), where each row is a tuple of fields. The first parameter defines which fields to return. It can be a single field name, a list of field names or <span class="inline_code">ALL</span>. The following parameters are optional and define field constraints. They can also be given as a dictionary:</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; db.pets.filter('name') # all rows, name</pre></div>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; db.pets.filter(('id', 'name')) # all rows, name + id</pre></div>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; db.pets.filter(ALL, type='cat') # type='cat', all fields</pre></div>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; db.pets.filter(ALL, type=('cat', 'dog')) # type='cat' OR type='dog' </pre></div>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; db.pets.filter(ALL, type='*at') # type='cat' OR 'hat' OR 'brat', ...</pre></div>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; db.pets.filter(ALL, type='cat', tail=True) # type='cat' AND tail=True </pre></div>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; db.pets.filter('id', {'type': 'cat', 'tail': True})
</pre></div>
<p>More complex queries can be constructed with a <span class="inline_code">Query</span>.</p>
<p>&nbsp;</p>
<hr />
<h2><a name="query"></a>Query</h2>
<p><span class="inline_code">Table.search()</span> returns a new <span class="inline_code">Query</span> with options for filtering, sorting and ordering rows by field value(s). It can include fields from other, related tables.</p>
<pre class="brush:python; gutter:false; light:true;">query = Table.search(
fields = ALL,
filters = [],
relations = [],
sort = None,
order = ASCENDING,
group = None,
function = FIRST,
range = None
)</pre><pre class="brush:python; gutter:false; light:true;">query.table # Parent Table.
query.fields # Field name, list of field names, or ALL.
query.aliases # Dictionary of (field name, alias)-items.
query.filters # List of filter() objects.
query.relations # List of rel() objects.
query.sort # Field name or list of field names.
query.order # ASCENDING | DESCENDING
query.group # Field name or list of field names.
query.function # FIRST, LAST, COUNT, MIN, MAX, SUM, AVG, CONCATENATE
query.range # (start, stop)-tuple, e.g. rows 11-20.</pre><pre class="brush:python; gutter:false; light:true;">query.sql() # SQL string, can be used with Database.execute().</pre><pre class="brush:python; gutter:false; light:true;">query.rows() # List of rows, each a tuple of fields.</pre><pre class="brush:python; gutter:false; light:true;">query.record(row) # Dictionary of (field, value)-items for given row.</pre><pre class="brush:python; gutter:false; light:true;">query.xml # XML string with the query schema and rows.</pre><p>To iterate rows memory-efficiently, use <span class="inline_code">iter(Query)</span> instead of <span class="inline_code">Query.rows()</span>.</p>
<h3><a name="filter"></a>Query filter</h3>
<p>The <span class="inline_code">filter()</span> function creates a field-value constraint that matches certain rows in a table. A list of filters can be passed to the <span class="inline_code">filters</span> parameter of a <span class="inline_code">Query</span>.</p>
<pre class="brush:python; gutter:false; light:true;">filter(field, value, comparison='=')</pre><table class="border">
<tbody>
<tr>
<td style="text-align: center;"><span class="smallcaps">Comparison</span></td>
<td><span class="smallcaps">Description</span></td>
<td><span class="smallcaps">Example</span></td>
<td style="text-align: center;"><span class="smallcaps">Alias</span></td>
</tr>
<tr>
<td style="text-align: center;"><span class="inline_code">=</span></td>
<td>equal to</td>
<td><span class="inline_code">filter('type',</span> <span class="inline_code">('cat',</span> <span class="inline_code">'dog'),</span> <span class="inline_code">'=') </span></td>
<td style="text-align: left;"><span class="inline_code">&nbsp;eq()</span></td>
</tr>
<tr>
<td style="text-align: center;"><span class="inline_code">i=</span></td>
<td>equal to (case-insensitive)</td>
<td><span class="inline_code">filter('name',</span> <span class="inline_code">'tig*',</span> <span class="inline_code">'i=') </span></td>
<td style="text-align: left;"><span class="inline_code">&nbsp;eqi()</span></td>
</tr>
<tr>
<td style="text-align: center;"><span class="inline_code">!=</span></td>
<td>not equal to</td>
<td><span class="inline_code">filter('name',</span> <span class="inline_code">'*y',</span> <span class="inline_code">'!=')</span></td>
<td style="text-align: left;"><span class="inline_code">&nbsp;ne()</span></td>
</tr>
<tr>
<td style="text-align: center;"><span class="inline_code">&gt;</span></td>
<td>greater than</td>
<td><span class="inline_code">filter('weight',</span> <span class="inline_code">10,</span> <span class="inline_code">'&gt;') </span></td>
<td style="text-align: left;"><span class="inline_code">&nbsp;gt()</span></td>
</tr>
<tr>
<td style="text-align: center;"><span class="inline_code">&lt;</span></td>
<td>less than</td>
<td><span class="inline_code">filter('weight',</span> <span class="inline_code">10,</span> <span class="inline_code">'&lt;') </span></td>
<td style="text-align: left;"><span class="inline_code">&nbsp;lt()</span></td>
</tr>
<tr>
<td style="text-align: center;"><span class="inline_code">&gt;=</span></td>
<td>greater than or equal to</td>
<td><span class="inline_code">filter(year('date'),</span> <span class="inline_code">1999,</span> <span class="inline_code">'&gt;=') </span></td>
<td style="text-align: left;"><span class="inline_code">&nbsp;gte()</span></td>
</tr>
<tr>
<td style="text-align: center;"><span class="inline_code">&lt;=</span></td>
<td>less than or equal to</td>
<td><span class="inline_code">filter(year('date'),</span> <span class="inline_code">2002,</span> <span class="inline_code">'&lt;=')</span></td>
<td style="text-align: left;"><span class="inline_code">&nbsp;lte()</span></td>
</tr>
<tr>
<td style="text-align: center;"><span class="inline_code">:</span></td>
<td>between (inclusive)</td>
<td><span class="inline_code">filter(year('date'),</span> <span class="inline_code">(1999,</span> <span class="inline_code">2002),</span> <span class="inline_code">':')</span></td>
<td style="text-align: left;"><span class="inline_code">&nbsp;rng()</span></td>
</tr>
</tbody>
</table>
<p>The field name of a <span class="inline_code">DATE</span> field can be passed to the&nbsp;<span class="inline_code">year()</span>, <span class="inline_code">month()</span>, <span class="inline_code">day()</span>, <span class="inline_code">hour()</span>, <span class="inline_code">minute()</span> or <span class="inline_code">second()</span> function.The short aliases of <span class="inline_code">filter()</span> have a preset comparison operator.</p>
<h3>Query filter chain</h3>
<p>Filters can be chained together. The <span class="inline_code">all()</span> function returns a list with AND logic. The <span class="inline_code">any()</span> function returns a list with OR logic. In the example below, the first query matches <span style="text-decoration: underline;">all</span> cats named Taxi. The second and third query match <span style="text-decoration: underline;">any</span> pet that is cat OR that is named Taxi.</p>
<pre class="brush:python; gutter:false; light:true;">all(filter1, filter2, ...) # Rows must match ALL of the filters.</pre><pre class="brush:python; gutter:false; light:true;">any(filter1, filter2, ...) # Rows must match ANY of the filters.</pre><div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.db import Database, eq, all, any
&gt;&gt;&gt;
&gt;&gt;&gt; db = Database('my_stuff')
&gt;&gt;&gt;
&gt;&gt;&gt; db.pets.search(filters=all(eq('name', 'Taxi'), eq('type', 'cat')))
&gt;&gt;&gt; db.pets.search(filters=any(eq('name', 'Taxi'), eq('type', 'cat')))
&gt;&gt;&gt; db.pets.search(filters=any(name='Taxi', type='cat')) </pre></div>
<p>Lists created with <span class="inline_code">all()</span> and <span class="inline_code">any()</span> can be nested to define complex search criteria. The example below matches all pets that are cats, and whose name starts with Fluff- OR ends with a -y:</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; f = any(eq('name', 'Fluff*'), eq('name', '*y')) # OR
&gt;&gt;&gt; f = all(eq('type', 'cat'), f) # AND
&gt;&gt;&gt;
&gt;&gt;&gt; for row in db.pets.search(filters=f):
&gt;&gt;&gt; print row</pre></div>
<p>The syntax can even be more concise:</p>
<div class="example">
<pre class="brush: python;gutter: false; light: true; fontsize: 100; first-line: 1; ">&gt;&gt;&gt; for row in db.pets.search(filters=all(name=('Fluff*', '*y'), type='cat')):
&gt;&gt;&gt; print row </pre></div>
<h3>Query relation</h3>
<p>The <span class="inline_code">rel()</span> function defines a relation between two fields in different tables (usually id's).</p>
<pre class="brush: python;gutter: false; light: true; fontsize: 100; first-line: 1; ">rel(field1, field2, table, join=LEFT) # LEFT | INNER</pre><p>The optional <span class="inline_code">join</span> parameter defines how rows are matched. <span class="inline_code">LEFT</span> takes all rows from the base table, with additional fields from the related table. For a row with no match between <span class="inline_code">field1</span> and <span class="inline_code">field2</span>, these fields have value <span class="inline_code">None</span>. <span class="inline_code">INNER</span> takes the subset of rows that have a match between <span class="inline_code">field1</span> and <span class="inline_code">field2</span>.</p>
<p>A well-known example is a database app that processes invoices. Say we have a products table and an orders table. Each order has a product id instead of all product details. Each product id can occur in multiple orders. This approach is called database normalization. It avoids duplicate data. To generate an invoice, we can combine product details and order details using a query relation.</p>
<p>The following example demonstrates a simple products + customers + orders database app:</p>
<table class="border=">
<tbody>
<tr>
<td>
<table class="border" style="margin: 0;">
<tbody>
<tr>
<td style="text-align: center;" colspan="3"><span class="smallcaps">products</span></td>
</tr>
<tr>
<td style="text-align: center;"><span class="smallcaps">id</span></td>
<td style="text-align: left;"><span class="smallcaps">name</span></td>
<td style="text-align: center;"><span class="smallcaps">price</span></td>
</tr>
<tr>
<td style="text-align: center;">1</td>
<td style="text-align: left;">pizza</td>
<td style="text-align: center;">15</td>
</tr>
<tr>
<td style="text-align: center;">2</td>
<td style="text-align: left;">garlic bread</td>
<td style="text-align: center;">3</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="border" style="margin: 0;">
<tbody>
<tr>
<td style="text-align: center;" colspan="3"><span class="smallcaps">customers</span></td>
</tr>
<tr>
<td style="text-align: center;"><span class="smallcaps">id</span></td>
<td style="text-align: left;"><span class="smallcaps">name</span></td>
</tr>
<tr>
<td style="text-align: center;">1</td>
<td style="text-align: left;">Schrödinger</td>
</tr>
<tr>
<td style="text-align: center;">2</td>
<td style="text-align: left;">Hofstadter</td>
</tr>
</tbody>
</table>
</td>
<td>
<table class="border" style="margin: 0;">
<tbody>
<tr>
<td style="text-align: center;" colspan="3"><span class="smallcaps">orders</span></td>
</tr>
<tr>
<td style="text-align: center;"><span class="smallcaps">id</span></td>
<td style="text-align: center;"><span class="smallcaps">product</span></td>
<td style="text-align: center;"><span class="smallcaps">customer</span></td>
</tr>
<tr>
<td style="text-align: center;">1</td>
<td style="text-align: center;">1</td>
<td style="text-align: center;">2</td>
</tr>
<tr>
<td>&nbsp;</td>
<td>&nbsp;</td>
<td>&nbsp;</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.db import Database, field, pk, INTEGER as I
&gt;&gt;&gt;
&gt;&gt;&gt; db = Database('pizza_delivery')
&gt;&gt;&gt;
&gt;&gt;&gt; db.create( 'products', (pk(), field('name'), field('price', I)))
&gt;&gt;&gt; db.create('customers', (pk(), field('name')))
&gt;&gt;&gt; db.create( 'orders', (pk(), field('product', I), field('customer', I)))</pre></div>
<div class="example">Add products and customers. Pizza delivery is open for business!</div>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; db.products.append(name='pizza', price=15)
&gt;&gt;&gt; db.products.append(name='garlic bread', price=3)</pre></div>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; db.customers.append(name=u'Schrödinger')
&gt;&gt;&gt; db.customers.append(name=u'Hofstadter')</pre></div>
<p>Hofstadter orders a pizza.</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; db.orders.append(product=1, customer=2)</pre></div>
<div class="example">An orders query with relations to products and customers generates a human-readable invoice:</div>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.db import Database, rel
&gt;&gt;&gt;
&gt;&gt;&gt; db = Database('pizza_delivery')
&gt;&gt;&gt;
&gt;&gt;&gt; f = ('orders.id', 'customers.name', 'products.name', 'products.price')
&gt;&gt;&gt; q = db.orders.search(f, relations=(
&gt;&gt;&gt; rel('orders.customer', 'customers.id', 'customers'),
&gt;&gt;&gt; rel('orders.product', 'products.id', 'products'))
&gt;&gt;&gt; )
&gt;&gt;&gt; for row in q:
&gt;&gt;&gt; print q.record(row)
{ 'orders.id' : 1,
'customers.name' : u'Hofstadter',
'products.name' : u'pizza',
'products.price' : 15 }</pre></div>
<div class="example">If a relation is used repeatedly, define it once with <span class="inline_code">Database.link()</span>. It will be available in every <span class="inline_code">Query</span>.</div>
<h3>Grouping rows</h3>
<p>A <span class="inline_code">Query</span> has an optional parameter <span class="inline_code">group</span> that can be used to merge rows on duplicate field values. The given <span class="inline_code">function</span> is applied to the other fields. It can also be a list with a function for each field.</p>
<table class="border">
<tbody>
<tr>
<td><span class="smallcaps">Function</span></td>
<td style="text-align: center;"><span class="smallcaps">Field type</span></td>
<td><span class="smallcaps">Description</span></td>
</tr>
<tr>
<td><span class="inline_code">FIRST</span></td>
<td style="text-align: center;">any</td>
<td>The first row field in the group.</td>
</tr>
<tr>
<td><span class="inline_code">LAST</span></td>
<td style="text-align: center;">any</td>
<td>The last row field in the group.</td>
</tr>
<tr>
<td><span class="inline_code">COUNT</span></td>
<td style="text-align: center;">any</td>
<td>The number of rows in the group.</td>
</tr>
<tr>
<td><span class="inline_code">MIN</span></td>
<td style="text-align: center;"><span class="inline_code">INTEGER</span> + <span class="inline_code">FLOAT</span></td>
<td>The lowest field value in the group.</td>
</tr>
<tr>
<td><span class="inline_code">MAX</span></td>
<td style="text-align: center;"><span class="inline_code">INTEGER</span> + <span class="inline_code">FLOAT</span></td>
<td>The highest field value in the group.</td>
</tr>
<tr>
<td><span class="inline_code">SUM</span></td>
<td style="text-align: center;"><span class="inline_code">INTEGER</span> + <span class="inline_code">FLOAT</span></td>
<td>The sum of all field values in the group.</td>
</tr>
<tr>
<td><span class="inline_code">AVG</span></td>
<td style="text-align: center;"><span class="inline_code">INTEGER</span> + <span class="inline_code">FLOAT</span></td>
<td>The average of all field values in the group.</td>
</tr>
<tr>
<td><span class="inline_code">STDEV</span></td>
<td style="text-align: center;"><span class="inline_code">INTEGER</span> + <span class="inline_code">FLOAT</span></td>
<td>The standard deviation (= variation from average).</td>
</tr>
<tr>
<td><span class="inline_code">CONCATENATE</span></td>
<td style="text-align: center;"><span class="inline_code">STRING</span></td>
<td>Joins all field values with a comma.</td>
</tr>
</tbody>
</table>
<p>For example, to get the total revenue per ordered product:</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; print db.orders.search(
&gt;&gt;&gt; fields = ('products.name', 'products.price'),
&gt;&gt;&gt; relations = rel('product', 'products.id', 'products'),
&gt;&gt;&gt; group = 'products.name', # Merge orders with same product name.
&gt;&gt;&gt; function = SUM # Sum of product prices.
&gt;&gt;&gt; ).rows()</pre></div>
<p>&nbsp;</p>
<hr />
<h2><a name="datasheet"></a>Datasheet</h2>
<p>A <span class="inline_code">Datasheet</span> is a matrix of rows and columns, where each row and column can be retrieved as a list. The data can be imported or exported as a CSV-file. Optionally, the given <span class="inline_code">fields</span> is a list of <span class="inline_code">(name,</span> <span class="inline_code">type)</span> headers, where <span class="inline_code">type</span> can be <span class="inline_code">STRING</span>, <span class="inline_code">TEXT</span>, <span class="inline_code">INTEGER</span>, <span class="inline_code">FLOAT</span>, <span class="inline_code">BOOLEAN</span>, <span class="inline_code">BLOB</span> or <span class="inline_code">DATE</span>.</p>
<pre class="brush:python; gutter:false; light:true;">datasheet = Datasheet(rows=[], fields=None)</pre><pre class="brush:python; gutter:false; light:true;">datasheet = Datasheet.load(path, separator=',', decoder=lambda v: v, headers=False)
</pre><pre class="brush:python; gutter:false; light:true;">datasheet.rows # List of rows (each row = list of values).
datasheet.columns # List of columns (each column = list of values).
datasheet.fields # List of (name, type) column headers.
datasheet.&lt;field&gt; # List of column values. </pre><pre class="brush:python; gutter:false; light:true;">datasheet[i] # Row at index i.
datasheet[i, j] # Value in row i at column j.
datasheet[i1:i2, j] # Slice of column j from rows i1-i2.
datasheet[i, j1:j2] # Slice of columns j1-j2 from row i.
datasheet[i1:i2, j1:j2] # Datasheet with columns j1-j2 from rows i1-i2.
datasheet[:] # Datasheet copy. </pre><pre class="brush:python; gutter:false; light:true;">datasheet.insert(i, row, default=None)
datasheet.append(row, default=None)
datasheet.extend(rows, default=None)
datasheet.copy(rows=ALL, columns=ALL)</pre><pre class="brush:python; gutter:false; light:true;">datasheet.group(j, function=FIRST, key=lambda v: v)</pre><pre class="brush:python; gutter:false; light:true;">datasheet.save(path, separator=',', encoder=lambda v: v, headers=False)</pre><pre class="brush:python; gutter:false; light:true;">datasheet.json # JSON-formatted string.</pre><ul>
<li><span class="inline_code">Datasheet.insert()</span> and <span class="inline_code">append()</span> fill missing columns with the <span class="inline_code">default</span> value.</li>
<li><span class="inline_code">Datasheet.columns.insert()</span> and <span class="inline_code">append()</span> fill missing rows with the <span class="inline_code">default</span> value.<br />An optional <span class="inline_code">field</span> parameter can be used to supply a (<span class="inline_code">name</span>, <span class="inline_code">type</span>) column header.</li>
<li><span class="inline_code">Datasheet.copy()</span> returns a new <span class="inline_code">Datasheet</span> from a selective list of row and/or column indices.</li>
<li>To rotate a datasheet 90 degrees, use <span class="inline_code">datasheet</span> <span class="inline_code">=</span> <span class="inline_code">flip(datasheet)</span>.</li>
</ul>
<p>For example:</p>
<div class="example">
<pre class="brush: python;gutter: false; light: true; fontsize: 100; first-line: 1; ">&gt;&gt;&gt; from pattern.db import Datasheet
&gt;&gt;&gt;
&gt;&gt;&gt; ds = Datasheet()
&gt;&gt;&gt; ds.append((u'Schrödinger', 'cat'))
&gt;&gt;&gt; ds.append((u'Hofstadter', 'cat'))
&gt;&gt;&gt; ds.save('pets.csv')
&gt;&gt;&gt;
&gt;&gt;&gt; ds = Datasheet.load('pets.csv')
&gt;&gt;&gt; print ds
[[u'Schrödinger', 'cat'],
[ u'Hofstadter', 'cat']]</pre></div>
<h3>Grouping rows</h3>
<p><span class="inline_code">Datasheet.group(j)</span> returns a new <span class="inline_code">Datasheet</span> with unique values in column <span class="inline_code">j</span>. It merges rows using a given <span class="inline_code">function</span> that takes a list of column values and returns a single value. Predefined functions are <span class="inline_code">FIRST</span>, <span class="inline_code">LAST</span>, <span class="inline_code">COUNT</span>, <span class="inline_code">MIN</span>, <span class="inline_code">MAX</span>, <span class="inline_code">SUM</span>, <span class="inline_code">AVG</span>, <span class="inline_code">STDEV</span> and <span class="inline_code">CONCATENATE</span>. It can also be a list of functions.</p>
<p>The optional <span class="inline_code">key</span> can be used to compare the values in column <span class="inline_code">j</span>. For example, <span class="inline_code">lambda</span> <span class="inline_code">date:</span> <span class="inline_code">date.year</span> groups a column of <span class="inline_code">Date</span> objects by year.</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.db import Datasheet, pprint
&gt;&gt;&gt;
&gt;&gt;&gt; ds = Datasheet(rows=[
&gt;&gt;&gt; (1, u'Schrödinger', 'cat'),
&gt;&gt;&gt; (2, u'Hofstadter', 'cat'),
&gt;&gt;&gt; (3, u'Taxi', 'dog')
&gt;&gt;&gt; ])
&gt;&gt;&gt;
&gt;&gt;&gt; g = ds.copy(columns=[2, 0]) # A copy with type &amp; id.
&gt;&gt;&gt; g = g.group(0, COUNT) # Group type, count rows per type.
&gt;&gt;&gt; pprint(g, fill='')
cat 2
dog 1 </pre></div>
<h3>Sorting rows &amp; columns</h3>
<p><span class="inline_code">Datasheet.columns[j].sort()</span> sorts the rows according to the values in column <span class="inline_code">j</span>. <br /><span class="inline_code">Datasheet.columns.sort()</span> can be used to change the column order:</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; ds.columns.sort(order=[0, 2, 1])
&gt;&gt;&gt; pprint(ds, fill='')
1 cat Schrödinger
2 cat Hofstadter
3 dog Taxi</pre></div>
<p><span class="inline_code">Datasheet.columns.swap(j1,j2)</span> swaps two individual columns with given indices.</p>
<h3><a name="csv"></a>CSV import &amp; export</h3>
<p><span class="inline_code">Datasheet.save()</span> exports the matrix as a CSV file. <span class="inline_code">Datasheet.load()</span> returns a <span class="inline_code">Datasheet</span> from a given CSV file. CSV (comma-separated values) is a simple text format for tabular data, where each line is a row and each value is separated by a comma.</p>
<pre class="brush:python; gutter:false; light:true;">datasheet = Datasheet.load(path, separator=',', decoder=lambda v: v, headers=False)</pre><pre class="brush:python; gutter:false; light:true;">datasheet.save(path, separator=',', encoder=lambda v: v, headers=False)</pre><p>On export, all&nbsp;<span class="inline_code">str</span>, <span class="inline_code">int</span>, <span class="inline_code">float</span>, <span class="inline_code">bool</span> and <span class="inline_code">Date</span> values are converted to Unicode. An <span class="inline_code">encoder</span> can be given for other data types. On import, all values in the datasheet will be Unicode unless a <span class="inline_code">decoder</span> is given.</p>
<p>With <span class="inline_code">headers=True</span>, the <span class="inline_code">Datasheet.fields</span> headers are exported and imported (first line in CSV). In this case, the data type for each column (<span class="inline_code">STRING</span>, <span class="inline_code">INTEGER</span>, <span class="inline_code">FLOAT</span>, <span class="inline_code">BOOLEAN</span> or <span class="inline_code">DATE</span>) is explicitly known and no <span class="inline_code">encoder</span> or <span class="inline_code">decoder</span> is needed.</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.db import Datasheet, STRING, DATE, date
&gt;&gt;&gt;
&gt;&gt;&gt; ds = Datasheet(fields=(('name', STRING), ('date', DATE)))
&gt;&gt;&gt; ds.append((u'Schrödinger', date('1887-08-12')))
&gt;&gt;&gt; ds.append((u'Hofstadter', date('1945-02-15')))
&gt;&gt;&gt;
&gt;&gt;&gt; ds.save('pets.csv', headers=True)
&gt;&gt;&gt;
&gt;&gt;&gt; ds = Datasheet.load('pets.csv', headers=True)
&gt;&gt;&gt; print ds[0]
[u'Schrödinger', Date('1887-08-12 00:00:00')]
</pre></div>
<p>The <span class="inline_code">csv()</span> function can also be used instead of <span class="inline_code">Datasheet.load()</span>:</p>
<div class="example">
<pre class="brush: python;gutter: false; light: true; fontsize: 100; first-line: 1; ">&gt;&gt;&gt; from pattern.db import csv
&gt;&gt;&gt;
&gt;&gt;&gt; for name, date in csv('pets.csv', separator=',', headers=True):
&gt;&gt;&gt; print name, date</pre></div>
<p>&nbsp;</p>
<hr />
<h2><a name="date"></a>Date</h2>
<p>The <span class="inline_code">date()</span> function returns a new <span class="inline_code">Date</span>, a convenient subclass of Python's <span class="inline_code">datetime.datetime</span>. It takes an integer (Unix timestamp), a string or <span class="inline_code">NOW</span>. An optional string input format and output format can be given (e.g., <span class="inline_code">"%d/%m/%y"</span>). The default output format is <span class="inline_code">"YYYY-MM-DD hh:mm:ss"</span>.</p>
<pre class="brush:python; gutter:false; light:true;">d = date(int)</pre><pre class="brush:python; gutter:false; light:true;">d = date(NOW, format=DEFAULT)
</pre><pre class="brush:python; gutter:false; light:true;">d = date(string)</pre><pre class="brush:python; gutter:false; light:true;">d = date(string, format=DEFAULT)</pre><pre class="brush:python; gutter:false; light:true;">d = date(string, inputformat, format=DEFAULT)</pre><pre class="brush:python; gutter:false; light:true;">d = date(year, month, day, format=DEFAULT)</pre><pre class="brush:python; gutter:false; light:true;">d = date(year, month, day, hours, minutes, seconds, format=DEFAULT)</pre><pre class="brush: python;gutter: false; light: true; fontsize: 100; first-line: 1; ">d.year
d.month # 1-12
d.week # 1-52
d.weekday # 1-7
d.day # 1-31
d.minute # 1-60
d.second # 1-60
d.timestamp # Seconds elapsed since 1/1/1970.</pre><p>If no string input format is given, a number of common formats will be tried:</p>
<table class="border">
<tbody>
<tr>
<td><span class="smallcaps">Format</span></td>
<td><span class="smallcaps">Example</span></td>
</tr>
<tr>
<td><span class="inline_code">%Y-%m-%d %H:%M:%S</span></td>
<td>2010-09-21 09:27:01</td>
</tr>
<tr>
<td><span class="inline_code">%a, %d %b %Y %H:%M:%S %z</span></td>
<td>Tue, 9 Sep 2010 17:58:28 +0000</td>
</tr>
<tr>
<td><span class="inline_code">%Y-%m-%dT%H:%M:%SZ</span></td>
<td>2010-09-20T09:27:01Z</td>
</tr>
<tr>
<td><span class="inline_code">%Y-%m-%dT%H:%M:%S+0000</span></td>
<td>2010-09-20T09:27:01+0000</td>
</tr>
<tr>
<td><span class="inline_code">%Y-%m-%d %H:%M</span></td>
<td>2010-09-20 09:27</td>
</tr>
<tr>
<td><span class="inline_code">%Y-%m-%d</span></td>
<td>2010-09-20</td>
</tr>
<tr>
<td><span class="inline_code">%d/%m/%Y</span></td>
<td>20/09/2010</td>
</tr>
<tr>
<td><span class="inline_code">%d %B %Y</span></td>
<td>9 september 2010</td>
</tr>
<tr>
<td><span class="inline_code">%B %d %Y</span></td>
<td>September 9 2010</td>
</tr>
<tr>
<td><span class="inline_code">%B %d, %Y</span></td>
<td>September 09, 2010</td>
</tr>
</tbody>
</table>
<p>All date formats used in <a class="link-maintenance" href="pattern-web.html">pattern.web</a> (e.g., Twitter search result) are automatically detected.<br />For an overview of date format syntax, see: <a href="http://docs.python.org/library/time.html#time.strftime" target="_blank">http://docs.python.org/library/time.html#time.strftime</a>.<br />&nbsp;</p>
<p><span class="smallcaps">Date calculations</span></p>
<p>The <span class="inline_code">time()</span> function can be used to add or subtract time to a <span class="inline_code">Date</span>:</p>
<pre class="brush:python; gutter:false; light:true;">time(days=0, seconds=0, minutes=0, hours=0)</pre><div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.db import date, time
&gt;&gt;&gt;
&gt;&gt;&gt; d = date('23 august 2011')
&gt;&gt;&gt; d += time(days=2, hours=5)
&gt;&gt;&gt; print type(d)
&gt;&gt;&gt; print d
&gt;&gt;&gt; print d.year, d.month, d.day
&lt;class 'pattern.db.Date'&gt;
2011-08-25 05:00:00
2011, 8, 25 </pre></div>
<p>&nbsp;</p>
<hr />
<h2>See also</h2>
<ul>
<li><a href="http://www.cherrypy.org/" target="_blank">CherryPy</a> (BSD): o<span>bject-oriented HTTP framework for Python.</span></li>
<li><span><a href="https://www.djangoproject.com/" target="_blank">Django</a> (BSD): m</span><span>odel-view-controller framework for Python.</span></li>
</ul>
</div>
</div></div>
</div>
</div>
</div>
</div>
</div>
</div>
<script>
SyntaxHighlighter.all();
</script>
</body>
</html>

@ -0,0 +1,416 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<title>pattern-de</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link type="text/css" rel="stylesheet" href="../clips.css" />
<style>
/* Small fixes because we omit the online layout.css. */
h3 { line-height: 1.3em; }
#page { margin-left: auto; margin-right: auto; }
#header, #header-inner { height: 175px; }
#header { border-bottom: 1px solid #C6D4DD; }
table { border-collapse: collapse; }
#checksum { display: none; }
</style>
<link href="../js/shCore.css" rel="stylesheet" type="text/css" />
<link href="../js/shThemeDefault.css" rel="stylesheet" type="text/css" />
<script language="javascript" src="../js/shCore.js"></script>
<script language="javascript" src="../js/shBrushXml.js"></script>
<script language="javascript" src="../js/shBrushJScript.js"></script>
<script language="javascript" src="../js/shBrushPython.js"></script>
</head>
<body class="node-type-page one-sidebar sidebar-right section-pages">
<div id="page">
<div id="page-inner">
<div id="header"><div id="header-inner"></div></div>
<div id="content">
<div id="content-inner">
<div class="node node-type-page"
<div class="node-inner">
<div class="breadcrumb">View online at: <a href="http://www.clips.ua.ac.be/pages/pattern-de" class="noexternal" target="_blank">http://www.clips.ua.ac.be/pages/pattern-de</a></div>
<h1>pattern.de</h1>
<!-- Parsed from the online documentation. -->
<div id="node-1534" class="node node-type-page"><div class="node-inner">
<div class="content">
<p><span class="big">The pattern.de module contains a fast part-of-speech tagger for German (identifies nouns, adjectives, verbs, etc. in a sentence) and tools for German verb conjugation and noun singularization &amp; pluralization.</span></p>
<p>It can be used by itself or with other&nbsp;<a href="pattern.html">pattern</a>&nbsp;modules:&nbsp;<a href="pattern-web.html">web</a>&nbsp;|&nbsp;<a href="pattern-db.html">db</a>&nbsp;| <a href="pattern-en.html">en</a>&nbsp;|&nbsp;<a href="pattern-search.html">search</a>&nbsp;|&nbsp;<a href="pattern-vector.html">vector</a>&nbsp;|&nbsp;<a href="pattern-graph.html">graph</a>.</p>
<p><img src="../g/pattern_schema_de.gif" alt="" width="620" height="180" /></p>
<hr />
<h2>Documentation</h2>
<p>The functions in this module take the same parameters and return the same values as their counterparts in <a href="pattern-en.html">pattern.en</a>. Refer to the documentation there for more details.&nbsp;</p>
<h3>Gender</h3>
<p>German nouns and adjectives inflect according to gender. The <span class="inline_code">gender()</span> function predicts the gender (<span class="inline_code">MALE</span>, <span class="inline_code">FEMALE</span>,&nbsp;<span class="inline_code">NEUTRAL</span>) of&nbsp;a given noun with about 75% accuracy:&nbsp;</p>
<div class="example">
<pre class="brush: python;gutter: false; light: true; fontsize: 100; first-line: 1; ">&gt;&gt;&gt; from pattern.de import gender, MALE, FEMALE, NEUTRAL
&gt;&gt;&gt; print gender('Katze')
FEMALE</pre></div>
<h3>Article</h3>
<p>The <span class="inline_code">article()</span> function returns the article (<span class="inline_code">INDEFINITE</span> or <span class="inline_code">DEFINITE</span>) inflected by gender and role (<span class="inline_code">SUBJECT</span>, <span class="inline_code">OBJECT</span>, <span class="inline_code">INDIRECT</span> or <span class="inline_code">PROPERTY</span>).&nbsp;In the following example,&nbsp;<span class="inline_code">role=OBJECT</span>&nbsp;means that the article is used in front of a noun that is the object of the sentence, as in: <em>Ich sehe <span style="text-decoration: underline;">die Katze</span></em> (<em>I see the cat</em> what do I see?&nbsp;→ the cat).</p>
<div class="example">
<pre class="brush: python;gutter: false; light: true; fontsize: 100; first-line: 1; ">&gt;&gt;&gt; from pattern.de import article, DEFINITE, FEMALE, OBJECT
&gt;&gt;&gt; print article('Katze', DEFINITE, gender=FEMALE, role=OBJECT)
die</pre></div>
<h3>Noun singularization &amp; pluralization</h3>
<p>For German nouns there is <span class="inline_code">singularize()</span> and <span class="inline_code">pluralize()</span>.&nbsp;The implementation uses a statistical approach with 84% accuracy for singularization and 72% for pluralization.</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.de import singularize, pluralize
&gt;&gt;&gt; print singularize('Katzen')
&gt;&gt;&gt; print pluralize('Katze')
Katze
Katzen </pre></div>
<h3>Verb conjugation</h3>
<p>For German verbs there is <span class="inline_code">conjugate()</span>, <span class="inline_code">lemma()</span>, <span class="inline_code">lexeme()</span> and <span class="inline_code">tenses()</span>.&nbsp;The lexicon for verb conjugation contains about 2,000 common German verbs. For unknown verbs it will fall back to a rule-based approach with an accuracy of about 87%.</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.de import conjugate
&gt;&gt;&gt; from pattern.de import INFINITIVE, PRESENT, SG, SUBJUNCTIVE
&gt;&gt;&gt;
&gt;&gt;&gt; print conjugate('war', INFINITIVE)
&gt;&gt;&gt; print conjugate('war', PRESENT, 1, SG, mood=SUBJUNCTIVE)
sein
sei </pre></div>
<p>German verbs have more tenses than English verbs. In particular, the plural differs for each person and there are additional forms for the <span class="inline_code">IMPERATIVE</span> and <span class="inline_code">SUBJUNCTIVE</span> mood.&nbsp;The <span class="inline_code">conjugate()</span> function takes the following optional parameters:</p>
<table class="border">
<tbody>
<tr>
<td class="smallcaps">Tense</td>
<td class="smallcaps">Person</td>
<td class="smallcaps">Number</td>
<td class="smallcaps">Mood</td>
<td class="smallcaps">Aspect</td>
<td class="smallcaps">Alias</td>
<td class="smallcaps">Example</td>
</tr>
<tr>
<td class="inline_code">INFINITVE</td>
<td class="inline_code">None</td>
<td class="inline_code">None</td>
<td class="inline_code">None</td>
<td class="inline_code">None</td>
<td class="inline_code">"inf"</td>
<td><em>sein</em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">1</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1sg"</td>
<td><em>ich <span style="text-decoration: underline;">bin</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">2</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2sg"</td>
<td><em>du <span style="text-decoration: underline;">bist</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">3</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3sg"</td>
<td><em>er <span style="text-decoration: underline;">ist</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">1</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1pl"</td>
<td><em>wir <span style="text-decoration: underline;">sind</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">2</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2pl"</td>
<td><em>ihr <span style="text-decoration: underline;">seid</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">3</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3pl"</td>
<td><em>sie <span style="text-decoration: underline;">sind</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">None</td>
<td class="inline_code">None</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">PROGRESSIVE</td>
<td class="inline_code">"part"</td>
<td><em>seiend</em></td>
</tr>
<tr>
<td style="border-left: 0; border-right: 0; padding: 0;">&nbsp;</td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">2</td>
<td class="inline_code">SG</td>
<td class="inline_code">IMPERATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2sg!"</td>
<td><em>sei</em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">1</td>
<td class="inline_code">PL</td>
<td class="inline_code">IMPERATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1pl!"</td>
<td><em>seien</em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">2</td>
<td class="inline_code">PL</td>
<td class="inline_code">IMPERATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2pl!"</td>
<td><em>seid</em></td>
</tr>
<tr>
<td style="border-left: 0; border-right: 0; padding: 0;">&nbsp;</td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">1</td>
<td class="inline_code">SG</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1sg?"</td>
<td><em>ich <span style="text-decoration: underline;">sei</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">2</td>
<td class="inline_code">SG</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2sg?"</td>
<td><em>du <span style="text-decoration: underline;">seiest</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">3</td>
<td class="inline_code">SG</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3sg?"</td>
<td><em>ihr <span style="text-decoration: underline;">sei</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">1</td>
<td class="inline_code">PL</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1pl?"</td>
<td><em>wir <span style="text-decoration: underline;">seien</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">2</td>
<td class="inline_code">PL</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2pl?"</td>
<td><em>ihr <span style="text-decoration: underline;">seiet</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">3</td>
<td class="inline_code">PL</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3pl?"</td>
<td><em>sie <span style="text-decoration: underline;">seien</span></em></td>
</tr>
<tr>
<td style="border-left: 0; border-right: 0; padding: 0;">&nbsp;</td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">1</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1sgp"</td>
<td><em>ich <span style="text-decoration: underline;">war</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">2</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2sgp"</td>
<td><em>du <span style="text-decoration: underline;">warst</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">3</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3sgp"</td>
<td><em>er <span style="text-decoration: underline;">war</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">1</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1ppl"</td>
<td><em>wir <span style="text-decoration: underline;">waren</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">2</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2ppl"</td>
<td><em>ihr <span style="text-decoration: underline;">wart</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">3</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3ppl"</td>
<td><em>sie <span style="text-decoration: underline;">waren</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">None</td>
<td class="inline_code">None</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">PROGRESSIVE</td>
<td class="inline_code">"ppart"</td>
<td><em>gewesen</em></td>
</tr>
<tr>
<td style="border-left: 0; border-right: 0; padding: 0;">&nbsp;</td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">1</td>
<td class="inline_code">SG</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1sgp?"</td>
<td><em>ich <span style="text-decoration: underline;">wäre</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">2</td>
<td class="inline_code">SG</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2sgp?"</td>
<td><em>du <span style="text-decoration: underline;">wärest</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">3</td>
<td class="inline_code">SG</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3sgp?"</td>
<td><em>er <span style="text-decoration: underline;">wäre</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">1</td>
<td class="inline_code">PL</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1ppl?"</td>
<td><em>wir <span style="text-decoration: underline;">wären</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">2</td>
<td class="inline_code">PL</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2ppl?"</td>
<td><em>ihr <span style="text-decoration: underline;">wäret</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">3</td>
<td class="inline_code">PL</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3ppl?"</td>
<td><em>sie <span style="text-decoration: underline;">wären</span></em></td>
</tr>
</tbody>
</table>
<p>Instead of optional parameters, a single short alias, or&nbsp;<span class="inline_code">PARTICIPLE</span> or <span class="inline_code">PAST+PARTICIPLE</span> can also be given. With no parameters, the infinitive form of the verb is returned.</p>
<h3>Attributive &amp; predicative adjectives&nbsp;</h3>
<p>German adjectives inflect with an <span class="inline_code">-e</span>,&nbsp;<span class="inline_code">-em</span>&nbsp;, <span class="inline_code">-en</span>, <span class="inline_code">-er</span>, or <span class="inline_code">-es</span> suffix (e.g., <em>neugierig</em>&nbsp;<em>die neugierige Katze</em>) depending on gender and role. You can get the base form with the <span class="inline_code">predicative()</span> function, or vice versa with&nbsp;<span class="inline_code">attributive()</span>.&nbsp;For predicative, a statistical approach is used with an accuracy of 98%. For attributive, you need to supply gender (<span class="inline_code">MALE</span>, <span class="inline_code">FEMALE</span>, <span class="inline_code">NEUTRAL</span>) and role (<span class="inline_code">SUBJECT</span>, <span class="inline_code">OBJECT</span>, <span class="inline_code">INDIRECT</span>, <span class="inline_code">PROPERTY</span>).</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.de import attributive, predicative
&gt;&gt;&gt; from pattern.de import MALE, FEMALE, SUBJECT, OBJECT
&gt;&gt;&gt;
&gt;&gt;&gt; print predicative('neugierige')
&gt;&gt;&gt; print attributive('neugierig', gender=FEMALE)
&gt;&gt;&gt; print attributive('neugierig', gender=FEMALE, role=OBJECT)
&gt;&gt;&gt; print attributive('neugierig', gender=FEMALE, role=INDIRECT, article="die")
neugierig
neugierige
neugierige
neugierigen </pre></div>
<h3>Parser</h3>
<p>For parsing there is <span class="inline_code">parse()</span>, <span class="inline_code">parsetree()</span> and <span class="inline_code">split()</span>. The <span class="inline_code">parse()</span> function annotates words in the given string with their part-of-speech <a class="link-maintenance" href="mbsp-tags.html">tags</a> (e.g., <span class="postag">NN</span> for nouns and <span class="postag">VB</span> for verbs). The <span class="inline_code">parsetree()</span> function takes a string and returns a tree of nested objects (<span class="inline_code">Text</span><span class="inline_code">Sentence</span><span class="inline_code">Chunk</span><span class="inline_code">Word</span>). The <span class="inline_code">split()</span> function takes the output of <span class="inline_code">parse()</span> and returns a <span class="inline_code">Text</span>. See the pattern.en documentation (<a class="link-maintenance" href="pattern-en.html#tree">here</a>) how to manipulate <span class="inline_code">Text</span> objects.&nbsp;</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.de import parse, split
&gt;&gt;&gt;
&gt;&gt;&gt; s = parse('Die Katze liegt auf der Matte.')
&gt;&gt;&gt; for sentence in split(s):
&gt;&gt;&gt; print sentence
Sentence('Die/DT/B-NP/O Katze/NN/I-NP/O liegt/VB/B-VP/O'
'auf/IN/B-PP/B-PNP der/DT/B-NP/I-PNP Matte/NN/I-NP/I-PNP ././O/O')</pre></div>
<p>The parser is built on Gerold Schneider &amp; Martin Volk's&nbsp;<a href="http://www.zora.uzh.ch/28579/" target="_blank">German language model</a>.&nbsp;The accuracy is around 85%. The original <a href="http://www.fi.muni.cz/~xnemcik/nlp/sarrebrugge/handout.pdf" target="_self">STTS</a> tagset is mapped to <a href="mbsp-tags.html">Penn Treebank</a> tagset. If you need to work with the original tags you can also use&nbsp;<span class="inline_code">parse()</span> with an optional parameter <span class="inline_code">tagset="STTS"</span>.</p>
<p class="small"><span style="text-decoration: underline;">Reference</span>: Schneider, G. &amp; Volk, M. (1998). <br />Adding manual constraints and lexical look-up to a Brill-tagger for German. <em>Proceedings of ESSLLI-98</em>.&nbsp;</p>
<h3>Sentiment analysis</h3>
<p>There's no&nbsp;<span class="inline_code">sentiment()</span> function for German yet.</p>
<p class="small"><span style="text-decoration: underline;">Note</span>: We did a test by automatically assigning scores (<span class="inline_code">-1.0</span>&nbsp;→ +<span class="inline_code">1.0</span>) to adjectives translated from English, but this approach only had 35% accuracy.</p>
</div>
</div></div>
</div>
</div>
</div>
</div>
</div>
</div>
<script>
SyntaxHighlighter.all();
</script>
</body>
</html>

@ -0,0 +1,367 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<title>pattern-dev</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link type="text/css" rel="stylesheet" href="../clips.css" />
<style>
/* Small fixes because we omit the online layout.css. */
h3 { line-height: 1.3em; }
#page { margin-left: auto; margin-right: auto; }
#header, #header-inner { height: 175px; }
#header { border-bottom: 1px solid #C6D4DD; }
table { border-collapse: collapse; }
#checksum { display: none; }
</style>
<link href="../js/shCore.css" rel="stylesheet" type="text/css" />
<link href="../js/shThemeDefault.css" rel="stylesheet" type="text/css" />
<script language="javascript" src="../js/shCore.js"></script>
<script language="javascript" src="../js/shBrushXml.js"></script>
<script language="javascript" src="../js/shBrushJScript.js"></script>
<script language="javascript" src="../js/shBrushPython.js"></script>
</head>
<body class="node-type-page one-sidebar sidebar-right section-pages">
<div id="page">
<div id="page-inner">
<div id="header"><div id="header-inner"></div></div>
<div id="content">
<div id="content-inner">
<div class="node node-type-page"
<div class="node-inner">
<div class="breadcrumb">View online at: <a href="http://www.clips.ua.ac.be/pages/pattern-dev" class="noexternal" target="_blank">http://www.clips.ua.ac.be/pages/pattern-dev</a></div>
<h1>pattern.dev</h1>
<!-- Parsed from the online documentation. -->
<div id="node-1480" class="node node-type-page"><div class="node-inner">
<div class="content">
<p><span class="big">Pattern is a web mining module for the Python programming language.</span></p>
<p><span class="big">Pattern is written in Python with extensions in JavaScript. The source code is hosted on GitHub. It is licensed under BSD, so it can be freely incorporated in proprietary applications. Contributions and donations are welcomed.</span></p>
<p>There are six core modules in the <a href="pattern.html">pattern</a> package: <a href="pattern-web.html">web</a> | <a href="pattern-db.html">db</a> | <a href="pattern-text.html">text</a> | <a href="pattern-search.html">search</a> | <a href="pattern-vector.html">vector</a> | <a href="pattern-graph.html">graph</a>.</p>
<p><img src="../g/pattern_schema.gif" alt="" width="620" height="180" /></p>
<hr />
<h2>Topics</h2>
<ul>
<li><a href="#contribute">Contributing</a></li>
<li><a href="#dependencies">Dependencies</a></li>
<li><a href="#documentation">Documentation</a></li>
<li><a href="#code">Coding conventions</a></li>
<li><a href="#quality">Code quality</a></li>
<li><a href="#language">Language support</a></li>
</ul>
<p>&nbsp;</p>
<hr />
<h2><a name="contribute"></a>Contribute</h2>
<p>The source code is hosted on <a href="https://github.com/clips/pattern" target="_blank">GitHub</a> (see <a class="noexternal link-maintenance" href="http://www.github.com/clips/pattern" target="_blank">http://ithub.com/clips/pattern</a>). GitHub is an online project hosting service with version control. Version control tracks changes to the source code, i.e., it can be rolled back to an earlier state or merged with revisions from different contributors.</p>
<p>To work on Pattern, create a <a href="http://help.github.com/fork-a-repo/" target="_blank">fork</a> of the project, a local copy of the source code that can be edited and updated by you alone. You can manage this copy with the free GitHub application (<a class="noexternal link-maintenance" href="http://windows.github.com/" target="_blank">windows</a> | <a class="noexternal link-maintenance" href="http://mac.github.com/" target="_blank">mac</a>). When you are ready, send us a <a href="http://help.github.com/send-pull-requests/" target="_blank">pull</a> request and we will integrate your changes in the main project.</p>
<p>Let us know if you encounter a bug. We prefer if you create an <a href="https://github.com/clips/pattern/issues" target="_blank">issue</a> on GitHub, so that (until fixed) the problem is visible to all users of Pattern. There is a blue button for donations on the main documentation page. Please support the development if you use Pattern commercially.</p>
<p>&nbsp;</p>
<hr />
<h2><a name="dependencies"></a>Dependencies</h2>
<p>There are six core modules in the package:</p>
<table class="border">
<tbody>
<tr>
<td><span class="smallcaps">Module</span></td>
<td><span class="smallcaps">Functionality</span></td>
</tr>
<tr>
<td>pattern.web</td>
<td>Asynchronous requests, web services, web crawler, HTML DOM parser.</td>
</tr>
<tr>
<td>pattern.db</td>
<td>Wrappers for databases (MySQL, SQLite) and CSV-files.</td>
</tr>
<tr>
<td>pattern.text</td>
<td>Base classes for parsers, parse trees and sentiment analysis.</td>
</tr>
<tr>
<td>pattern.search</td>
<td>Pattern matching algorithm for parsed text (syntax &amp; semantics).</td>
</tr>
<tr>
<td>pattern.vector</td>
<td>Vector space model, clustering, classification.</td>
</tr>
<tr>
<td>pattern.graph</td>
<td>Graph analysis &amp; visualization.</td>
</tr>
</tbody>
</table>
<p>There are two helper modules: pattern.metrics (statistics) and canvas.js (visualization).</p>
<h3>Design philosophy</h3>
<p>Pattern is written in Python, with JavaScript extensions for data visualization (graph.js and canvas.js). The package works out of the box. If C/C++ code is bundled for performance (e.g., LIBSVM), it includes precompiled binaries for all major platforms (Windows, Linux, Mac).</p>
<p>Pattern modules are standalone. If a module imports another module, it fails silently if that module is not present. For example, pattern.text implements a parser that uses a Perceptron language model when pattern.vector is present, but falls back to a lexicon of known words and rules for unknown words if used by itself. A single module can have a lot of interdependent classes, hence the large __init.__.py files.</p>
<p>Pattern modules can bundle other BSD-licensed Python projects (e.g., BeautifulSoup). For larger projects or GPL-licensed projects, it provides code to map data structures.</p>
<h3>Base classes</h3>
<p>In pattern.web, each web service (e.g., Google, Twitter) inherits from <span class="inline_code">SearchEngine</span> and returns <span class="inline_code">Result</span> objects. Each MediaWiki web service (e.g., Wikipedia, Wiktionary) inherits from <span class="inline_code">MediaWiki</span>.</p>
<p>In pattern.db, each database engine is wrapped by <span class="inline_code">Database</span>. It supports MySQL and SQLite, with future plans for MongoDB. See <span class="inline_code">Database</span><span class="inline_code">.connect()</span>, <span class="inline_code">escape()</span>, <span class="inline_code">_field_SQL()</span> and <span class="inline_code">_update()</span>.</p>
<p>In pattern.text, each language inherits from <span class="inline_code">Parser</span>, having a lexicon of known words and an optional language model. Case studies for <a class="link-maintenance" href="http://www.clips.ua.ac.be/pages/using-wikicorpus-nltk-to-build-a-spanish-part-of-speech-tagger">Spanish</a> and <a class="link-maintenance" href="http://www.clips.ua.ac.be/pages/using-wiktionary-to-build-an-italian-part-of-speech-tagger">Italian</a> show how to train a <span class="inline_code">Lexicon</span>. A bundled pattern.vector example shows how to train a Perceptron <span class="inline_code">Model</span>.</p>
<p>In pattern.vector, each classifier inherits from <span class="inline_code">Classifier</span> (e.g., KNN, SVM). Each clustering algorithm is available from <span class="inline_code">Model.cluster()</span>.</p>
<p>In pattern.graph, subclasses of <span class="inline_code">Node</span> or <span class="inline_code">Edge</span> can be used with (subclasses of) <span class="inline_code">Graph</span> by setting the <span class="inline_code">base</span> parameter of <span class="inline_code">Graph.add_node()</span> and <span class="inline_code">add_edge()</span>. Each layout algorithm (e.g., force-based springs) inherits from <span class="inline_code">GraphLayout</span>.</p>
<p>&nbsp;</p>
<hr />
<h2><a name="documentation"></a>Documentation</h2>
<p>Each function or method has a docstring:</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">def find(match=lambda item: False, list=[]):
""" Returns the first item in the given list for which match(item) is True.
"""
for item in list:
if match(item) is True:
return item</pre></div>
<p>The docstring provides a concise description of the type of input and output. In Pattern, a docstrings starts with "Returns" (for a function) or "Yields" (for a property). Each function has a unit test, to verify that it is fit for use. Each function has an engaging example, bundled in the package or in the documentation.</p>
<p>Pattern does not have a documentation framework. The documentation is written by hand and in constant revision. Please report spelling errors and examples with bugs.</p>
<p>&nbsp;</p>
<hr />
<h2><a name="code"></a>Coding conventions</h2>
<h3>Whitespace</h3>
<p>The source code is not strict <a href="http://www.python.org/dev/peps/pep-0008/" target="_blank">PEP8</a>. For example, additional whitespace is used so that property assignments or inline comments are vertically aligned as a block:</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">class Table(object):
def __init__(self, name, database):
""" A collection of rows with one or more fields of a certain type.
"""
self.database = database
self.name = name
self.fields = [] # List of field names (i.e., column names).
self.schema = {} # Dictionary of (field, Schema)-items.
self.default = {} # Default values for Table.insert().
self.primary_key = None
self._update()</pre></div>
<p>Whitespace is sometimes used to align dictionary keys and values:</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">url = URL('http://search.twitter.com/search.json?', method=GET, query={
'q': query,
'page': start,
'rpp': min(count, 100)
})</pre></div>
<h3>Class and function names</h3>
<p>Single words are preferred for class names. Compound terms use CamelCase, e.g., <span class="inline_code">SearchEngine</span> or <span class="inline_code">AsynchronousRequest</span>. Single, descriptive words are preferred for functions and methods. Compound terms use lowercase_with_underscore. If a method takes no arguments, it is a property:</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">class AsynchronousRequest:
@property
def done(self):
return not self._thread.isAlive() # We'd prefer "_thread.alive".</pre></div>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">while not request.done:
... </pre></div>
<h3>Variable names</h3>
<p>The source code uses single character names abundantly. For example, dictionary <span style="text-decoration: underline;">k</span>eys and <span style="text-decoration: underline;">v</span>alues are <span class="inline_code">k</span> and <span class="inline_code">v</span>, a string is <span class="inline_code">s</span>. This is done to make the structure of the algorithm stand out (i.e., the actual function and method calls):</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">def normalize(s, punctuation='!?.:;,()[] '):
s = s.decode('utf-8')
s = s.lower()
s = s.strip(punctuation)
return s</pre></div>
<p>Frequently used single character variable names:</p>
<table class="border">
<tbody>
<tr>
<td style="text-align: center;"><span class="smallcaps">Variable</span></td>
<td><span class="smallcaps">Meaning</span></td>
<td><span class="smallcaps">Example</span></td>
</tr>
<tr>
<td style="text-align: center;"><span class="inline_code">a</span></td>
<td>array, all</td>
<td><span class="inline_code">a = [normalize(w) for w in words]</span></td>
</tr>
<tr>
<td style="text-align: center;"><span class="inline_code">b</span></td>
<td>boolean</td>
<td><span class="inline_code">while b is False:</span></td>
</tr>
<tr>
<td style="text-align: center;"><span class="inline_code">d</span></td>
<td>distance, document</td>
<td><span class="inline_code">d = distance(v1, v2)</span></td>
</tr>
<tr>
<td style="text-align: center;"><span class="inline_code">e</span></td>
<td>element</td>
<td><span class="inline_code">e = html.find('#nav')</span></td>
</tr>
<tr>
<td style="text-align: center;"><span class="inline_code">f</span></td>
<td>file, filter, function</td>
<td><span class="inline_code">f = open('data.csv', 'r')</span></td>
</tr>
<tr>
<td style="text-align: center;"><span class="inline_code">i</span></td>
<td>index</td>
<td><span class="inline_code">for i in range(len(matrix)):</span></td>
</tr>
<tr>
<td style="text-align: center;"><span class="inline_code">j</span></td>
<td>index</td>
<td><span class="inline_code">for j in range(len(matrix[i])):</span></td>
</tr>
<tr>
<td style="text-align: center;"><span class="inline_code">k</span></td>
<td>key</td>
<td><span class="inline_code">for k in vector.keys():</span></td>
</tr>
<tr>
<td style="text-align: center;"><span class="inline_code">n</span></td>
<td>list length</td>
<td><span class="inline_code">n = len(a)</span></td>
</tr>
<tr>
<td style="text-align: center;"><span class="inline_code">p</span></td>
<td>parser, pattern</td>
<td><span class="inline_code">p = pattern.search.compile('NN')</span></td>
</tr>
<tr>
<td style="text-align: center;"><span class="inline_code">q</span></td>
<td>query</td>
<td><span class="inline_code">for r in twitter.search(q):</span></td>
</tr>
<tr>
<td style="text-align: center;"><span class="inline_code">r</span></td>
<td>result, row</td>
<td><span class="inline_code">for r in csv('data.csv):</span></td>
</tr>
<tr>
<td style="text-align: center;"><span class="inline_code">s</span></td>
<td>string</td>
<td><span class="inline_code">s = s.decode('utf-8').strip()</span></td>
</tr>
<tr>
<td style="text-align: center;"><span class="inline_code">t</span></td>
<td>time</td>
<td><span class="inline_code">t = time.time() - t0</span></td>
</tr>
<tr>
<td style="text-align: center;"><span class="inline_code">v</span></td>
<td>value, vector</td>
<td><span class="inline_code">for k, v in vector.items():</span></td>
</tr>
<tr>
<td style="text-align: center;"><span class="inline_code">w</span></td>
<td>word</td>
<td><span class="inline_code">for i, w in enumerate(sentence.words):</span></td>
</tr>
<tr>
<td style="text-align: center;"><span class="inline_code">x</span></td>
<td>horizontal position</td>
<td><span class="inline_code">node.x = 0</span></td>
</tr>
<tr>
<td style="text-align: center;"><span class="inline_code">y</span></td>
<td>vertical position</td>
<td><span class="inline_code">node.y = 0</span></td>
</tr>
</tbody>
</table>
<h3>Dictionaries</h3>
<p>The source code uses dictionaries abundantly. Dictionaries are fast for lookup. For example, pattern.vector represents vectors as sparse feature&nbsp;→ weight dictionaries:</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">v1 = document1.vector
v2 = document2.vector
cos = sum(v1.get(w,0) * f for w, f in v2.items()) / (norm(v1) * norm(v2) or 1)</pre></div>
<p>Pattern algorithms are <a class="link-maintenance" href="pattern-metrics.html#profile">profiled</a> and optimized with caching mechanisms.</p>
<h3>List comprehensions</h3>
<p>The source code uses list comprehension abundantly. It is concise, and often faster than <span class="inline_code">map()</span>. However, it can also be harder to read (a comment should be added).</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">def words(s, punctuation='!?.:;,()[] '):
return [w.strip(punctuation) for w in s.split()]
</pre></div>
<h3>Ternary operator</h3>
<p>Previous versions of Pattern supported Python 2.4, which does have the ternary operator (single-line if). A part of the source code still uses a boolean condition to emulate it:</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">s = s.lower() if lowercase is True else s # Python 2.5+</pre></div>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">s = lowercase is True and s.lower() or s # Python 2.4</pre></div>
<p>With boolean conditions, care must be taken for values <span class="inline_code">0</span>, <span class="inline_code">''</span>, <span class="inline_code">[]</span>, <span class="inline_code">()</span>, <span class="inline_code">{}</span>, and <span class="inline_code">None</span>, since they evaluate as&nbsp;<span class="inline_code">False</span> and trigger the or-clause.</p>
<p>&nbsp;</p>
<hr />
<h2><a name="quality"></a>Code quality</h2>
<p>The source code has about 25,000 lines of Python code (25% unit tests), 5,000 lines of JavaScript, and 20,000 lines of bundled dependencies (BeautifulSoup, PDFMiner, PyWordNet, LIBSVM, LIBLINEAR, etc.). To evaluate the code quality,&nbsp;<a href="http://www.logilab.org/857" target="_blank">pylint</a>&nbsp;can be used:</p>
<div class="install">
<pre class="gutter:false; light:true;">&gt; cd pattern-2.x
&gt; pylint pattern --rcfile=.pylintrc</pre></div>
<p>Important pylint id's are those starting with <span class="inline_code">E</span> (= possible bugs).</p>
<p>The&nbsp;<span class="inline_code">.pylintrc</span>&nbsp;configuration file defines a number of custom settings:</p>
<ul>
<li>Instead of 80 characters per line, a 100 characters are allowed.</li>
<li>Ignore pylint id <span class="inline_code">C0103</span>, single-character variable names are allowed.</li>
<li>Ignore pylint id <span class="inline_code">W0142</span>,&nbsp;<span class="inline_code">*args</span> and <span class="inline_code">**kwargs</span> are allowed.</li>
<li>Ignore bundled dependencies.</li>
</ul>
<p>The source code scores about 7.38 / 10. A known issue is the absence of docstrings in unit tests.</p>
<p>&nbsp;</p>
<hr />
<h2><a name="language"></a>Language support</h2>
<p>Pattern currently has natural language processing tools (e.g., pattern.en, pattern.es) for most languages on the to-do list.&nbsp;There is no sentiment analysis yet for Spanish and German. Chinese is an open task.</p>
<table class="border">
<tbody>
<tr>
<td><span class="smallcaps">Language</span></td>
<td style="text-align: center;"><span class="smallcaps">Code</span></td>
<td style="text-align: center;"><span class="smallcaps">Speakers</span></td>
<td><span class="smallcaps">Example countries</span></td>
</tr>
<tr>
<td>Mandarin</td>
<td style="text-align: center;"><span class="inline_code">cmn</span></td>
<td style="text-align: center;">955M</td>
<td>China + Taiwan (945), Singapore (3)</td>
</tr>
<tr>
<td><s>Spanish</s></td>
<td style="text-align: center;"><span class="inline_code">es</span></td>
<td style="text-align: center;">350M</td>
<td>Argentina (40), Colombia (40), Mexico (100), Spain (45)</td>
</tr>
<tr>
<td><s>English</s></td>
<td style="text-align: center;"><span class="inline_code">en</span></td>
<td style="text-align: center;">340M</td>
<td>Canada (30), United Kingdom (60), United States (300)</td>
</tr>
<tr>
<td><s>German</s></td>
<td style="text-align: center;"><span class="inline_code">de</span></td>
<td style="text-align: center;">100M</td>
<td>Austria (10), Germany (80), Switzerland (7)</td>
</tr>
<tr>
<td><s>French</s></td>
<td style="text-align: center;"><span class="inline_code">fr</span></td>
<td style="text-align: center;">70M</td>
<td>France (65), Côte d'Ivoire (20)</td>
</tr>
<tr>
<td><s>Italian</s></td>
<td style="text-align: center;"><span class="inline_code">it</span></td>
<td style="text-align: center;">60M</td>
<td>Italy (60)</td>
</tr>
<tr>
<td><s>Dutch</s></td>
<td style="text-align: center;"><span class="inline_code">nl</span></td>
<td style="text-align: center;">25M</td>
<td>The Netherlands (25), Belgium (5), Suriname (1)</td>
</tr>
</tbody>
</table>
<p>There are two case studies that demonstrate how to build a pattern.xx language module:</p>
<ul>
<li><a href="http://www.clips.ua.ac.be/pages/using-wiktionary-to-build-an-italian-part-of-speech-tagger">Using Wikitionary to build an Italian part-of-speech tagger</a></li>
<li><a href="http://www.clips.ua.ac.be/pages/using-wikicorpus-nltk-to-build-a-spanish-part-of-speech-tagger">Using Wikicorpus &amp; NLTK to build a Spanish part-of-speech tagger</a></li>
</ul>
</div>
</div></div>
</div>
</div>
</div>
</div>
</div>
</div>
<script>
SyntaxHighlighter.all();
</script>
</body>
</html>

@ -0,0 +1,733 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<title>pattern-en</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link type="text/css" rel="stylesheet" href="../clips.css" />
<style>
/* Small fixes because we omit the online layout.css. */
h3 { line-height: 1.3em; }
#page { margin-left: auto; margin-right: auto; }
#header, #header-inner { height: 175px; }
#header { border-bottom: 1px solid #C6D4DD; }
table { border-collapse: collapse; }
#checksum { display: none; }
</style>
<link href="../js/shCore.css" rel="stylesheet" type="text/css" />
<link href="../js/shThemeDefault.css" rel="stylesheet" type="text/css" />
<script language="javascript" src="../js/shCore.js"></script>
<script language="javascript" src="../js/shBrushXml.js"></script>
<script language="javascript" src="../js/shBrushJScript.js"></script>
<script language="javascript" src="../js/shBrushPython.js"></script>
</head>
<body class="node-type-page one-sidebar sidebar-right section-pages">
<div id="page">
<div id="page-inner">
<div id="header"><div id="header-inner"></div></div>
<div id="content">
<div id="content-inner">
<div class="node node-type-page"
<div class="node-inner">
<div class="breadcrumb">View online at: <a href="http://www.clips.ua.ac.be/pages/pattern-en" class="noexternal" target="_blank">http://www.clips.ua.ac.be/pages/pattern-en</a></div>
<h1>pattern.en</h1>
<!-- Parsed from the online documentation. -->
<div id="node-1383" class="node node-type-page"><div class="node-inner">
<div class="content">
<p class="big">The pattern.en module contains a fast part-of-speech tagger for English (identifies nouns, adjectives, verbs, etc. in a sentence), sentiment analysis, tools for English verb conjugation and noun singularization &amp; pluralization, and a WordNet interface.</p>
<p>It can be used by itself or with other <a href="pattern.html">pattern</a> modules: <a href="pattern-web.html">web</a> | <a href="pattern-db.html">db</a>&nbsp;| en | <a href="pattern-search.html">search</a> | <a href="pattern-vector.html">vector</a> | <a href="pattern-graph.html">graph</a>.</p>
<p><img src="../g/pattern_schema.gif" alt="" width="620" height="180" /></p>
<hr />
<h2>Documentation</h2>
<ul>
<li><a href="#article">Indefinite article</a></li>
<li><a href="#pluralization">Pluralization + singularization</a></li>
<li><a href="#comparative">Comparative + superlative</a></li>
<li><a href="#conjugation">Verb conjugation</a></li>
<li><a href="#quantify">Quantification</a></li>
<li><a href="#spelling">Spelling</a></li>
<li><a href="#ngram">n-grams</a></li>
<li><a href="#parser">Parser</a>&nbsp;<span class="smallcaps link-maintenance">(tokenizer, tagger, chunker)</span></li>
<li><a href="#tree">Parse trees</a></li>
<li><a href="#sentiment">Sentiment</a></li>
<li><a href="#modality">Mood &amp; modality</a></li>
<li><a href="#wordnet">WordNet</a></li>
<li><a href="#wordlist">Wordlists</a></li>
</ul>
<p>&nbsp;</p>
<hr />
<h2><a name="article"></a>Indefinite article</h2>
<p>The article is the most common determiner (<span class="postag">DT</span>) in English. It defines whether the successive noun is definite (<em><span style="text-decoration: underline;">the</span> cat</em>) or indefinite (<em><span style="text-decoration: underline;">a</span> cat</em>). The definite article is always <em>the</em>. The indefinite article can be&nbsp;<em>a</em> or <em>an</em>&nbsp;depending on how the successive noun is pronounced.</p>
<pre class="brush:python; gutter:false; light:true;">article(word, function=INDEFINITE) # DEFINITE | INDEFINITE</pre><pre class="brush:python; gutter:false; light:true;">referenced(word, article=INDEFINITE) # Returns article + word.
</pre><div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.en import referenced
&gt;&gt;&gt;
&gt;&gt;&gt; print referenced('university')
&gt;&gt;&gt; print referenced('hour')
a university
an hour</pre></div>
<p><span class="small"><span style="text-decoration: underline;">Reference</span>: Granger, M. (2006). <em>Ruby Linguistics Framework</em>, </span><span class="small">http://deveiate.org/projects/Linguistics</span></p>
<p>&nbsp;</p>
<hr />
<h2><a name="pluralization"></a>Pluralization + singularization</h2>
<p>The <span class="inline_code">pluralize()</span> function returns the plural form of a singular noun. The <span class="inline_code">singularize()</span> function returns the singular form of a plural noun. The <span class="inline_code">pos</span> parameter (part-of-speech) can be set to <span class="inline_code">NOUN</span> or <span class="inline_code">ADJECTIVE</span>, but only a small number of possessive adjectives inflect (e.g. <em>my</em><em>our</em>). The <span class="inline_code">custom</span> dictionary is for user-defined replacements. Accuracy of the algorithms is 96%.</p>
<pre class="brush:python; gutter:false; light:true;">pluralize(word, pos=NOUN, custom={}, classical=True)</pre><pre class="brush:python; gutter:false; light:true;">singularize(word, pos=NOUN, custom={})</pre><div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.en import pluralize, singularize
&gt;&gt;&gt;
&gt;&gt;&gt; print pluralize('child')
&gt;&gt;&gt; print singularize('wolves')
children
wolf
</pre></div>
<p><span class="small"><span style="text-decoration: underline;">Reference</span>: <br />Conway, D. (1998). An Algorithmic Approach to English Pluralization. <em>Proceedings of the 2nd Perl conference</em>.<br />Ferrer, B. (2005). <em>Inflector for Python</em>, http://www.bermi.org/projects/inflector</span></p>
<p>&nbsp;</p>
<hr />
<h2><a name="comparative"></a>Comparative + superlative</h2>
<p>The <span class="inline_code">comparative()</span> and <span class="inline_code">superlative()</span> functions give the comparative or superlative form of an adjective. Words with three or more syllables (e.g., <em>fantastic</em>) are simply preceded by <em>more</em> or <em>most</em>.</p>
<pre class="brush:python; gutter:false; light:true;">comparative(adjective) # big =&gt; bigger</pre><pre class="brush:python; gutter:false; light:true;">superlative(adjective) # big =&gt; biggest</pre><div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.en import comparative, superlative
&gt;&gt;&gt;
&gt;&gt;&gt; print comparative('bad')
&gt;&gt;&gt; print superlative('bad')
worse
worst
</pre></div>
<p>&nbsp;</p>
<hr />
<h2><a name="conjugation"></a>Verb conjugation</h2>
<p>The pattern.en module has a lexicon of 8,500 common English verbs and their conjugated forms (infinitive, 3rd singular present, present participle, past and past participle verbs such as <em>be</em>&nbsp;may have more forms). Some verbs can also be negated, including&nbsp;<em>be</em>, <em>can</em>, <em>do</em>, <em>will</em>, <em>must</em>, <em>have</em>, <em>may</em>, <em>need</em>, <em>dare</em>, <em>ought</em>.</p>
<pre class="brush:python; gutter:false; light:true;">conjugate(verb,
tense = PRESENT, # INFINITIVE, PRESENT, PAST, FUTURE
person = 3, # 1, 2, 3 or None
number = SINGULAR, # SG, PL
mood = INDICATIVE, # INDICATIVE, IMPERATIVE, CONDITIONAL, SUBJUNCTIVE
aspect = IMPERFECTIVE, # IMPERFECTIVE, PERFECTIVE, PROGRESSIVE
negated = False, # True or False
parse = True) </pre><pre class="brush:python; gutter:false; light:true;">lemma(verb) # Base form, e.g., are =&gt; be.</pre><pre class="brush:python; gutter:false; light:true;">lexeme(verb) # List of possible forms: be =&gt; is, was, ...</pre><pre class="brush:python; gutter:false; light:true;">tenses(verb) # List of possible tenses of the given form.
</pre><p>The&nbsp;<span class="inline_code">conjugate()</span> function takes the following optional parameters:</p>
<table class="border">
<tbody>
<tr>
<td style="text-align: left;"><span class="smallcaps">Tense</span></td>
<td style="text-align: left;"><span class="smallcaps">Person</span></td>
<td style="text-align: left;"><span class="smallcaps">Number</span></td>
<td style="text-align: left;"><span class="smallcaps">Mood</span></td>
<td style="text-align: left;"><span class="smallcaps">Aspect</span></td>
<td style="text-align: left;"><span class="smallcaps">Alias</span></td>
<td style="text-align: center;"><span class="smallcaps">Tag</span></td>
<td style="text-align: left;"><span class="smallcaps">Example</span></td>
</tr>
<tr>
<td><span class="inline_code">INFINITIVE</span></td>
<td><span class="inline_code">None</span></td>
<td><span class="inline_code">None</span></td>
<td><span class="inline_code">None</span></td>
<td><span class="inline_code">None</span></td>
<td><span class="inline_code">"inf"</span></td>
<td style="text-align: center;"><span class="postag">VB</span></td>
<td><em>be</em></td>
</tr>
<tr>
<td><span class="inline_code">PRESENT</span></td>
<td><span class="inline_code">1</span></td>
<td><span class="inline_code">SG</span></td>
<td><span class="inline_code">INDICATIVE</span></td>
<td><span class="inline_code">IMPERFECTIVE</span></td>
<td><span class="inline_code">"1sg"</span></td>
<td style="text-align: center;"><span class="postag">VBP</span></td>
<td><em>I <span style="text-decoration: underline;">am</span></em></td>
</tr>
<tr>
<td><span class="inline_code">PRESENT</span></td>
<td><span class="inline_code">2</span></td>
<td><span class="inline_code">SG</span></td>
<td><span class="inline_code">INDICATIVE</span></td>
<td><span class="inline_code">IMPERFECTIVE</span></td>
<td><span class="inline_code">"2sg"</span></td>
<td style="text-align: center;">&nbsp;·</td>
<td><em>you <span style="text-decoration: underline;">are</span></em></td>
</tr>
<tr>
<td><span class="inline_code">PRESENT</span></td>
<td><span class="inline_code">3</span></td>
<td><span class="inline_code">SG</span></td>
<td><span class="inline_code">INDICATIVE</span></td>
<td><span class="inline_code">IMPERFECTIVE</span></td>
<td><span class="inline_code">"3sg"</span></td>
<td style="text-align: center;"><span class="postag">VBZ</span></td>
<td><em>he <span style="text-decoration: underline;">is</span></em></td>
</tr>
<tr>
<td><span class="inline_code">PRESENT</span></td>
<td><span class="inline_code">None</span></td>
<td><span class="inline_code">PL</span></td>
<td><span class="inline_code">INDICATIVE</span></td>
<td><span class="inline_code">IMPERFECTIVE</span></td>
<td><span class="inline_code">"pl"</span></td>
<td style="text-align: center;">&nbsp;·</td>
<td><em>are</em></td>
</tr>
<tr>
<td><span class="inline_code">PRESENT</span></td>
<td><span class="inline_code">None</span></td>
<td><span class="inline_code">None</span></td>
<td><span class="inline_code">INDICATIVE</span></td>
<td><span class="inline_code">PROGRESSIVE</span></td>
<td><span class="inline_code">"part"</span></td>
<td style="text-align: center;"><span class="postag">VBG</span></td>
<td><em>being</em></td>
</tr>
<tr>
<td style="border-left: 0; border-right: 0; padding: 0;">&nbsp;</td>
</tr>
<tr>
<td><span class="inline_code">PAST</span></td>
<td><span class="inline_code">None</span></td>
<td><span class="inline_code">None</span></td>
<td><span class="inline_code">None</span></td>
<td><span class="inline_code">None</span></td>
<td><span class="inline_code">"p"</span></td>
<td style="text-align: center;"><span class="postag">VBD</span></td>
<td><em>were</em></td>
</tr>
<tr>
<td><span class="inline_code">PAST</span></td>
<td><span class="inline_code"><span>1</span></span></td>
<td><span class="inline_code"><span>PL</span></span></td>
<td><span class="inline_code">INDICATIVE</span></td>
<td><span class="inline_code">IMPERFECTIVE</span></td>
<td><span class="inline_code">"1sgp"</span></td>
<td style="text-align: center;">&nbsp;·</td>
<td><em>I <span style="text-decoration: underline;">was</span></em></td>
</tr>
<tr>
<td><span class="inline_code">PAST</span></td>
<td><span class="inline_code"><span>2</span></span></td>
<td><span class="inline_code"><span>PL</span></span></td>
<td><span class="inline_code"><span>INDICATIVE</span></span></td>
<td><span class="inline_code">IMPERFECTIVE</span></td>
<td><span class="inline_code">"2sgp"</span></td>
<td style="text-align: center;">&nbsp;·</td>
<td><em>you <span style="text-decoration: underline;">were</span></em></td>
</tr>
<tr>
<td><span class="inline_code">PAST</span></td>
<td><span class="inline_code"><span>3</span></span></td>
<td><span class="inline_code"><span>PL</span></span></td>
<td><span class="inline_code"><span>INDICATIVE</span></span></td>
<td><span class="inline_code">IMPERFECTIVE</span></td>
<td><span class="inline_code">"3gp"</span></td>
<td style="text-align: center;">&nbsp;·</td>
<td><em>he <span style="text-decoration: underline;">was</span></em></td>
</tr>
<tr>
<td><span class="inline_code">PAST</span></td>
<td><span class="inline_code"><span>None</span></span></td>
<td><span class="inline_code"><span>PL</span></span></td>
<td><span class="inline_code"><span>INDICATIVE</span></span></td>
<td><span class="inline_code">IMPERFECTIVE</span></td>
<td><span class="inline_code">"ppl"</span></td>
<td style="text-align: center;">&nbsp;·</td>
<td><em>were</em></td>
</tr>
<tr>
<td style="text-align: left;"><span class="inline_code">PAST</span></td>
<td style="text-align: left;"><span><span>None</span></span></td>
<td style="text-align: left;"><span class="inline_code">None</span></td>
<td style="text-align: left;"><span class="inline_code">INDICATIVE</span></td>
<td style="text-align: left;"><span class="inline_code"><span>PROGRESSIVE</span></span></td>
<td style="text-align: left;"><span class="inline_code">"ppart"</span></td>
<td style="text-align: center;"><span class="postag">VBN</span></td>
<td style="text-align: left;"><em>been</em></td>
</tr>
</tbody>
</table>
<p>Instead of optional parameters, a single short alias, the part-of-speech tag, or&nbsp;<span class="inline_code">PARTICIPLE</span>&nbsp;or <span class="inline_code">PAST+PARTICIPLE</span> can also be given. With no parameters, the infinitive form of the verb is returned.</p>
<p>For example:</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.en import conjugate, lemma, lexeme
&gt;&gt;&gt;
&gt;&gt;&gt; print lexeme('purr')
&gt;&gt;&gt; print lemma('purring')
&gt;&gt;&gt; print conjugate('purred', '3sg') # he / she / it
['purr', 'purrs', 'purring', 'purred']
purr
purrs
</pre></div>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.en import tenses, PAST, PL
&gt;&gt;&gt;
&gt;&gt;&gt; print 'p' in tenses('purred') # By alias.
&gt;&gt;&gt; print PAST in tenses('purred')
&gt;&gt;&gt; print (PAST, 1, PL) in tenses('purred')
True
True
True </pre></div>
<p><span class="small"><span style="text-decoration: underline;">Reference</span>: <em>XTAG English morphology</em> (1999), University of Pennsylvania, http://www.cis.upenn.edu/~xtag</span></p>
<p>&nbsp;<br /><span class="smallcaps">Rule-based conjugation</span></p>
<p>All verb functions have an optional <span class="inline_code">parse</span>&nbsp;parameter (<span class="inline_code">True</span> by default) that enables a rule-based parser for unknown verbs. This will not work for irregular verbs, and it is fragile for verbs ending in -e in the past tense, or the present participle. The overall accuracy of the algorithm is 91%.</p>
<p>With <span class="inline_code">parse=False</span>,&nbsp;<span class="inline_code">conjugate()</span>&nbsp;and&nbsp;<span class="inline_code">lemma()</span>&nbsp;yield&nbsp;<span class="inline_code">None</span>:</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.en import verbs, conjugate, PARTICIPLE
&gt;&gt;&gt;
&gt;&gt;&gt; print 'google' in verbs.infinitives
&gt;&gt;&gt; print 'googled' in verbs.inflections
&gt;&gt;&gt;
&gt;&gt;&gt; print conjugate('googled', tense=PARTICIPLE, parse=False)
&gt;&gt;&gt; print conjugate('googled', tense=PARTICIPLE, parse=True)
False
False
None
googling
</pre></div>
<p>&nbsp;</p>
<hr />
<h2><a name="quantify"></a>Quantification</h2>
<p>The <span class="inline_code">number()</span> function returns a <span class="inline_code">float</span> or <span class="inline_code">int</span> parsed from the given (numeric) string. If no number can be parsed from the string, it returns <span class="inline_code">0</span>.</p>
<p>The <span class="inline_code">numerals()</span> function returns the given <span class="inline_code">int</span> or <span class="inline_code">float</span> as a string of numerals. By default, the fraction is rounded to two decimals.</p>
<p>The <span class="inline_code">quantify()</span> function returns a word count approximation. Two similar words are a <em>pair</em>, three to eight <em>several</em>, and so on. Words can be given as a list, a word → count dictionary, or as a single word + amount.</p>
<p>The <span class="inline_code">reflect()</span> function quantifies Python objects see the examples bundled with the module.</p>
<pre class="brush:python; gutter:false; light:true;">number(string) # "seventy-five point two" =&gt; 75.2</pre><pre class="brush:python; gutter:false; light:true;">numerals(n, round=2) # 2.245 =&gt; "two point twenty-five"</pre><pre class="brush:python; gutter:false; light:true;">quantify([word1, word2, ...], plural={})</pre><pre class="brush:python; gutter:false; light:true;">reflect(object, quantify=True, replace=[])
</pre><div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.en import quantify
&gt;&gt;&gt;
&gt;&gt;&gt; print quantify(['goose', 'goose', 'duck', 'chicken', 'chicken', 'chicken'])
&gt;&gt;&gt; print quantify({'carrot': 100, 'parrot': 20})
&gt;&gt;&gt; print quantify('carrot', amount=1000)
several chickens, a pair of geese and a duck
dozens of carrots and a score of parrots
hundreds of carrots
</pre></div>
<p>&nbsp;</p>
<hr />
<h2><a name="spelling"></a>Spelling</h2>
<p>The <span class="inline_code">suggest()</span> function returns a list of spelling suggestions for a given word. Each suggestion is a <span class="inline_code">(word,</span> <span class="inline_code">confidence)</span>-tuple. It is about 70% accurate.</p>
<pre class="brush: python;gutter: false; light: true; fontsize: 100; first-line: 1; ">suggest(string)</pre><div class="example">
<pre class="brush: python;gutter: false; light: true; fontsize: 100; first-line: 1; ">&gt;&gt;&gt; from pattern.en import suggest
&gt;&gt;&gt; print suggest("parot")
[("part", 0.99), ("parrot", 0.01)]</pre></div>
<p><span class="small"><span style="text-decoration: underline;">Reference</span>: Norvig, P. (2007). <em>How to Write a Spelling Corrector</em>. http://norvig.com/spell-correct.html</span>&nbsp;</p>
<p>&nbsp;</p>
<hr />
<h2><em><a name="ngram"></a>n</em>-grams</h2>
<p>The <span class="inline_code">ngrams()</span> function returns&nbsp;a list of <em>n</em>-grams (i.e., tuples of <em>n</em> successive words) from the given string.&nbsp;Alternatively, you can supply a <span class="inline_code">Text</span> or <span class="inline_code">Sentence</span> object (see further). Punctuation marks are stripped from words, and&nbsp;<em>n</em>-grams will not run over sentence delimiters (i.e., .!?), unless <span class="inline_code">continuous</span> is <span class="inline_code">True</span>.</p>
<pre class="brush:python; gutter:false; light:true;">ngrams(string, n=3, punctuation=".,;:!?()[]{}`''\"@#$^&amp;*+-|=~_", continuous=False)</pre><div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.en import ngrams
&gt;&gt;&gt; print ngrams("I am eating pizza.", n=2) # bigrams
[('I', 'am'), ('am', 'eating'), ('eating', 'pizza')] </pre></div>
<p>&nbsp;</p>
<hr />
<h2><a name="parser"></a>Parser</h2>
<p>A parser identifies sentences, words and word types in a string of text. This involves tokenization (distinguishing between abbreviations and sentence breaks), part-of-speech tagging (annotating words with their type, e.g., is <em>can</em> a <span class="postag">noun</span> or a <span class="postag">verb</span>?) and chunking (grouping consecutive words that belong together). Parsing can be used to answer questions such as <em>who did what and why</em> and is useful in a wide range of text mining applications.&nbsp;The pattern.en parser uses a lexicon of a 100,000 known words and their part-of-speech <a class="link-maintenance" href="MBSP-tags.html" target="_blank">tag</a>, along with rules for unknown words based on word suffix (e.g., <em>-ly</em> = <span class="postag">ADVERB</span>) and context (surrounding words). This approach is fast but not always accurate, since many words are ambiguous and hard to capture with simple rules. The overall accuracy is about 95% (95.8% on WSJ portions 22-24). It is lower for informal language use (e.g., chat language).</p>
<p>The <span class="inline_code">parse()</span> function takes a string of text and returns a part-of-speech tagged Unicode string. Sentences in the output are separated by newline characters.</p>
<pre class="brush:python; gutter:false; light:true;">parse(string,
tokenize = True, # Split punctuation marks from words?
tags = True, # Parse part-of-speech tags? (NN, JJ, ...)
chunks = True, # Parse chunks? (NP, VP, PNP, ...)
relations = False, # Parse chunk relations? (-SBJ, -OBJ, ...)
lemmata = False, # Parse lemmata? (ate =&gt; eat)
encoding = 'utf-8' # Input string encoding.
tagset = None) # Penn Treebank II (default) or UNIVERSAL.
</pre><p>For example:</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.en import parse
&gt;&gt;&gt; print parse('I eat pizza with a fork.')
I/PRP/B-NP/O eat/VBD/B-VP/O pizza/NN/B-NP/O with/IN/B-PP/B-PNP a/DT/B-NP/I-PNP
fork/NN/I-NP/I-PNP ././O/O
</pre></div>
<ul>
<li>With&nbsp;<span class="inline_code">tags</span><span class="inline_code">=True</span> each word is annotated with a part-of-speech tag.&nbsp;</li>
<li>With <span class="inline_code">chunks=True</span>&nbsp;each word is annotated with a chunk tag and a&nbsp;<span class="postag">PNP</span> tag (prepositional noun phrase, <span class="postag">PP</span> + <span class="postag">NP</span>). The <span class="inline_code postag">O</span> tag (= outside) means that the word is not part of a chunk.</li>
<li>With <span class="inline_code">relations=True</span>&nbsp;each word is annotated with a role tag (e.g., <span class="postag">-SBJ</span>&nbsp;for subject or -<span class="postag">OBJ</span>&nbsp;for).</li>
<li>With <span class="inline_code">lemmata=True</span> each word is annotated with its base form.&nbsp;</li>
<li>With <span class="inline_code">tokenize=False</span>, punctuation marks will not be separated from words. <br />The input string is expected to be tokenized beforehand, or sentence delimiters are not discovered.</li>
</ul>
<p><span class="small"><span style="text-decoration: underline;">Reference</span>: Brill, E. (1992). <em>A simple rule-based part of speech tagger.</em> ANLC '92 Proceedings.</span></p>
<h3>Parser tags</h3>
<p>Let's examine the word <em>fork</em> and the tags assigned by the parser in the example above:</p>
<table class="border">
<tbody>
<tr>
<td class="smallcaps" style="text-align: center;" align="center">word</td>
<td class="smallcaps" style="text-align: center;" align="center">part-of-speech</td>
<td class="smallcaps" style="text-align: center;" align="center">chunk</td>
<td class="smallcaps" style="text-align: center;" align="center">pnp</td>
</tr>
<tr>
<td align="center">fork</td>
<td align="center"><span class="postag">NN </span></td>
<td align="center"><span class="postag">I-NP</span></td>
<td align="center"><span class="postag">I-PNP</span></td>
</tr>
</tbody>
</table>
<p>The word's part-of-speech tag is <span class="postag">NN</span>, which means that it is a noun. The word occurs in a <span class="postag">NP</span> chunk, a noun phrase (i.e., <em>a fork</em>). It is also part of a prepositional noun phrase (i.e., <em><span style="text-decoration: underline;">with</span> a fork</em>).</p>
<p>Common part-of-speech tags are&nbsp;<span class="postag">NN</span> (noun), <span class="postag">VB</span> (verb),&nbsp;<span class="postag">JJ</span> (adjective), <span class="postag">RB</span> (adverb)&nbsp;and&nbsp;<span class="postag">IN</span> (preposition).<br />Common chunk tags are&nbsp;<span class="postag">NP</span> (noun phrase) and <span class="postag">VP</span> (verb phrase).<br />Common chunk relations are <span class="postag">NP-SBJ</span> (subject) and <span class="postag">NP-OBJ</span> (object).</p>
<p>The <a class="link-maintenance" href="MBSP-tags.html" target="_blank">Penn Treebank II tagset</a>&nbsp;gives an overview of all the possible tags generated by the parser.</p>
<h3>Parser tagger &amp; tokenizer</h3>
<p>The <span class="inline_code">tokenize()</span> function returns a list of sentences, with punctuation marks split from words. It takes an optional&nbsp;<span class="inline_code">replace</span>&nbsp;dictionary, by default used to split contractions, i.e.,&nbsp;<span class="inline_code">{"'ve":</span>&nbsp;<span class="inline_code">"&nbsp;</span><span class="inline_code">'ve"</span><span class="inline_code">,</span> <span class="inline_code">...}</span>.</p>
<p>The <span class="inline_code">tag()</span> function simply annotates words with their part-of-speech tag and returns a list of <span class="inline_code">(word,</span> <span class="inline_code">tag)</span>-tuples:</p>
<pre class="brush: python;gutter: false; light: true; fontsize: 100; first-line: 1; ">tokenize(string, punctuation=".,;:!?()[]{}`''\"@#$^&amp;*+-|=~_", replace={})</pre><pre class="brush:python; gutter:false; light:true;">tag(string, tokenize=True, encoding='utf-8')</pre><div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.en import tag
&gt;&gt;&gt;
&gt;&gt;&gt; for word, pos in tag('I feel *happy*!')
&gt;&gt;&gt; if pos == "JJ": # Retrieve all adjectives.
&gt;&gt;&gt; print word
happy</pre></div>
<h3>Parser output</h3>
<p>The output of&nbsp;<span class="inline_code">parse()</span>&nbsp;is a string of sentences in which each word has been annotated with the requested tags. The <span class="inline_code">pprint()</span> function gives a human-readable breakdown of the tags (the extra <em>p-</em> is for <em>pretty</em>).</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.en import parse
&gt;&gt;&gt; from pattern.en import pprint
&gt;&gt;&gt;
&gt;&gt;&gt; pprint(parse('I ate pizza.', relations=True, lemmata=True))
WORD TAG CHUNK ROLE ID PNP LEMMA
I PRP NP SBJ 1 - i
ate VBP VP - 1 - eat
pizza NN NP OBJ 1 - pizza
. . - - - - . </pre></div>
<p>The output of <span class="inline_code">parse()</span> is a subclass of <span class="inline_code">unicode</span> called&nbsp;<span class="inline_code">TaggedString</span>&nbsp;whose&nbsp;<span class="inline_code">TaggedString.split()</span> method by default yields a list of sentences, where each sentence is a list of tokens, where each token is a list of the word + its tags.</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.en import parse
&gt;&gt;&gt; print parse('I ate pizza.').split()
[[[u'I', u'PRP', u'B-NP', u'O'],
[u'ate', u'VBD', u'B-VP', u'O'],
[u'pizza', u'NN', u'B-NP', u'O'],
[u'.', u'.', u'O', u'O']]] </pre></div>
<p>The most convenient way to analyze and mine the output is to construct&nbsp;a <a href="#tree" target="_self">parse tree</a>.</p>
<p>&nbsp;</p>
<hr />
<h2><a name="tree"></a>Parse trees</h2>
<p>A parse tree stores a tagged string as a tree of nested objects that can be traversed to analyze the constituents in the text. The <span class="inline_code">parsetree()</span> function takes the same parameters as <span class="inline_code">parse()</span> and returns a <span class="inline_code">Text</span> object.&nbsp;A&nbsp;<span class="inline_code">Text</span> is a list of <span class="inline_code">Sentence</span> objects. Each <span class="inline_code">Sentence</span> is a list of <span class="inline_code">Word</span> objects. <span class="inline_code">Word</span> objects can be grouped in <span class="inline_code">Chunk</span> objects, which are related to other <span class="inline_code">Chunk</span> objects.</p>
<pre class="brush: python;gutter: false; light: true; fontsize: 100; first-line: 1; ">parsetree(string,
tokenize = True, # Split punctuation marks from words?
tags = True, # Parse part-of-speech tags? (NN, JJ, ...)
chunks = True, # Parse chunks? (NP, VP, PNP, ...)
relations = False, # Parse chunk relations? (-SBJ, -OBJ, ...)
lemmata = False, # Parse lemmata? (ate =&gt; eat)
encoding = 'utf-8' # Input string encoding.
tagset = None) # Penn Treebank II (default) or UNIVERSAL.
</pre><p>The following example shows the parse tree for the sentence "<em>The cat sat on the mat.</em>":</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.en import parsetree
&gt;&gt;&gt;
&gt;&gt;&gt; s = parsetree('The cat sat on the mat.', relations=True, lemmata=True)
&gt;&gt;&gt; print repr(s)
[Sentence(
u'The/DT/B-NP/O/NP-SBJ-1/the
cat/NN/I-NP/O/NP-SBJ-1/cat
sat/VBD/B-VP/O/VP-1/sit
on/IN/B-PP/B-PNP/O/on
the/DT/B-NP/I-PNP/O/the
mat/NN/I-NP/I-PNP/O/mat
././O/O/O/O/.')]</pre><pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; for sentence in s:
&gt;&gt;&gt; for chunk in sentence.chunks:
&gt;&gt;&gt; print chunk.type, [(w.string, w.type) for w in chunk.words]
NP [(u'the', u'DT'), (u'cat', u'NN')]
VP [(u'sat', u'VBD')]
PP [(u'on', u'IN')]
NP [(u'the', 'DT), (u'mat', u'NN')]
</pre></div>
<p>A common approach is to store output from <span class="inline_code">parse()</span>&nbsp;in a .txt file, with a tagged sentence on each line.&nbsp;The <span class="inline_code">tree()</span> function can be used to load it as a <span class="inline_code">Text</span> object. It has an optional <span class="inline_code">token</span> parameter that defines the format of the tokens (tagged words).&nbsp;So&nbsp;<span class="inline_code">parsetree(s)</span>&nbsp;is the same as&nbsp;<span class="inline_code">tree(parse(s)</span><span class="inline_code">)</span>.</p>
<pre class="brush: python;gutter: false; light: true; fontsize: 100; first-line: 1; ">tree(taggedstring, token=[WORD, POS, CHUNK, PNP, REL, LEMMA])</pre><div class="example">
<pre class="brush: python;gutter: false; light: true; fontsize: 100; first-line: 1; ">&gt;&gt;&gt; from pattern.en import tree
&gt;&gt;&gt;
&gt;&gt;&gt; for sentence in tree(open('tagged.txt'), token=[WORD, POS, CHUNK])
&gt;&gt;&gt; print sentence</pre></div>
<h3>Text</h3>
<p>A <span class="inline_code">Text</span> is a list of <span class="inline_code">Sentence</span> objects (i.e., it can be iterated with&nbsp;<span class="inline_code">for</span> <span class="inline_code">sentence</span> <span class="inline_code">in</span> <span class="inline_code">text:</span>).</p>
<pre class="brush:python; gutter:false; light:true;">text = Text(taggedstring, token=[WORD, POS, CHUNK, PNP, REL, LEMMA])</pre><pre class="brush:python; gutter:false; light:true;">text = Text.from_xml(xml) # Reads an XML string generated with Text.xml.
</pre><pre class="brush:python; gutter:false; light:true;">text.string # 'The cat sat on the mat .'
text.sentences # [Sentence('The cat sat on the mat .')]
text.copy()
text.xml</pre><h3>Sentence</h3>
<p>A <span class="inline_code">Sentence</span> is a list of <span class="inline_code">Word</span> objects, with attributes and methods that group words in <span class="inline_code">Chunk</span> objects.</p>
<pre class="brush:python; gutter:false; light:true;">sentence = Sentence(taggedstring, token=[WORD, POS, CHUNK, PNP, REL, LEMMA])</pre><pre class="brush:python; gutter:false; light:true;">sentence = Sentence.from_xml(xml)
</pre><pre class="brush:python; gutter:false; light:true;">sentence.parent # Sentence parent, or None.
sentence.id # Unique id for each sentence.
sentence.start # 0
sentence.stop # len(Sentence).
</pre><pre class="brush:python; gutter:false; light:true;">sentence.string # Tokenized string, without tags.
sentence.words # List of Word objects.
sentence.lemmata # List of word lemmata.
sentence.chunks # List of Chunk objects.
sentence.subjects # List of NP-SBJ chunks.
sentence.objects # List of NP-OBJ chunks.
sentence.verbs # List of VP chunks.
sentence.relations # {'SBJ': {1: Chunk('the cat/NP-SBJ-1')},
# 'VP': {1: Chunk('sat/VP-1')},
# 'OBJ': {}}
sentence.pnp # List of PNPChunks: [Chunk('on the mat/PNP')]
</pre><pre class="brush:python; gutter:false; light:true;">sentence.constituents(pnp=False)</pre><pre class="brush:python; gutter:false; light:true;">sentence.slice(start, stop)
sentence.copy()
sentence.xml
</pre><ul>
<li><span class="inline_code">Sentence.constituents()</span> returns a mixed, in-order list of <span class="inline_code">Word</span> and <span class="inline_code">Chunk</span> objects.<br />With <span class="inline_code">pnp=True</span>, it will yield&nbsp;<span class="inline_code">PNPChunk</span> objects whenever possible.</li>
<li><span class="inline_code">Sentence.slice()</span>&nbsp;returns a <span class="inline_code">Slice</span> (= a subclass of <span class="inline_code">Sentence</span>) starting with the word at index <span class="inline_code">start</span> and containing all words up to (not including) index <span class="inline_code">stop</span>.</li>
</ul>
<h3>Sentence words</h3>
<p>A <span class="inline_code">Sentence</span> is made up of <span class="inline_code">Word</span> objects, which are also grouped in <span class="inline_code">Chunk</span> objects:</p>
<pre class="brush:python; gutter:false; light:true;">word = Word(sentence, string, lemma=None, type=None, index=0)</pre><pre class="brush:python; gutter:false; light:true;">word.sentence # Sentence parent.
word.index # Sentence index of word.
word.string # String (Unicode).
word.lemma # String lemma, e.g. 'sat' =&gt; 'sit',
word.type # Part-of-speech tag (NN, JJ, VBD, ...)
word.chunk # Chunk parent, or None.
word.pnp # PNPChunk parent, or None.</pre><h3>Sentence chunks</h3>
<p>A <span class="inline_code">Chunk</span> is a list of <span class="inline_code">Word</span> objects that belong together. <br />Multiple chunks can be part of a <span class="inline_code">PNPChunk</span>, which start with a <span class="postag">PP</span> chunk followed by <span class="postag">NP</span> chunks.</p>
<pre class="brush:python; gutter:false; light:true;">chunk = Chunk(sentence, words=[], type=None, role=None, relation=None)</pre><pre class="brush:python; gutter:false; light:true;">chunk.sentence # Sentence parent.
chunk.start # Sentence index of first word.
chunk.stop # Sentence index of last word + 1.
chunk.string # String of words (Unicode).
chunk.words # List of Word objects.
chunk.lemmata # List of word lemmata.
chunk.head # Primary Word in the chunk.
chunk.type # Chunk tag (NP, VP, PP, ...)
chunk.role # Role tag (SBJ, OBJ, ...)
chunk.relation # Relation id, e.g. NP-SBJ-1 =&gt; 1.
chunk.relations # List of (id, role)-tuples.
chunk.related # List of Chunks with same relation id.
chunk.subject # NP-SBJ chunk with same id.
chunk.object # NP-OBJ chunk with same id.
chunk.verb # VP chunk with same id.
chunk.modifiers # []
chunk.conjunctions # []
chunk.pnp # PNPChunk parent, or None.
</pre><pre class="brush:python; gutter:false; light:true;">chunk.previous(type=None)
chunk.next(type=None)
chunk.nearest(type='VP')</pre><ul>
<li><span class="inline_code">Chunk.head</span> yields the primary&nbsp;<span class="inline_code">Word</span> in the chunk: <em>the big cat</em><em>cat</em>.</li>
<li><span class="inline_code">Chunk.relations</span>&nbsp;contains all relations the chunk is part of. <br />Some chunks have multiple relations, e.g., <span class="postag">SBJ</span> as well as&nbsp;<span class="postag">OBJ</span>, or&nbsp;<span class="postag">OBJ</span> of multiple <span class="postag">VP</span>'s.</li>
<li>For <span class="postag">VP</span> chunks, <span class="inline_code">Chunk.modifiers</span> is a list of nearby adjectives and adverbs that have no relations. <br />For example, in <em>the cat purred happily</em>, modifier of&nbsp;<em>purred</em>&nbsp;<em>happily</em>.</li>
<li><span class="inline_code">Chunk.conjunctions</span> is a list of chunks linked by <em>and</em>&nbsp;and&nbsp;<em>or</em> to this chunk. <br />For example in <em>up and down</em>: the <em>up</em> chunk has conjunctions: <span class="inline_code">[(Chunk('down'),</span> <span class="inline_code">AND)]</span>.</li>
</ul>
<h3>Prepositional noun phrases</h3>
<p>A <span class="inline_code">PNPChunk</span>&nbsp;or prepositional noun phrase is a subclass of <span class="inline_code">Chunk</span>.&nbsp;It groups <span class="postag">PP</span> + <span class="postag">NP</span> chunks (= <span class="postag">PNP</span>).</p>
<pre class="brush:python; gutter:false; light:true;">pnp = PNPChunk(sentence, words=[], type=None, role=None, relation=None)</pre><pre class="brush:python; gutter:false; light:true;">pnp.string # String of words (Unicode).
pnp.chunks # List of Chunk objects.
pnp.preposition # First PP chunk in the PNP.
</pre><p>Words and chunks that are part of a <span class="postag">PNP</span> will have their <span class="inline_code">Word.pnp</span> and <span class="inline_code">Chunk.pnp</span> attribute set.&nbsp;All prepositional noun phrases in a sentence can be retrieved with <span class="inline_code">Sentence.pnp</span>.</p>
<p>&nbsp;</p>
<hr />
<h2><a name="sentiment"></a>Sentiment</h2>
<p>Written text can be broadly categorized into two types: facts and opinions. Opinions carry people's sentiments, appraisals and feelings toward the world. The pattern.en module bundles a lexicon of adjectives (e.g., <em>good</em>, <em>bad</em>, <em>amazing</em>, <em>irritating</em>, ...) that occur frequently in product reviews, annotated with scores for sentiment polarity (positive ↔&nbsp;negative) and subjectivity (objective ↔ subjective).&nbsp;</p>
<p>The <span class="inline_code">sentiment()</span> function returns a <span class="inline_code">(polarity,</span> <span class="inline_code">subjectivity)</span>-tuple for the given sentence, based on the adjectives it contains,&nbsp;where polarity is a value between <span class="inline_code">-1.0</span> and +<span class="inline_code">1.0</span> and subjectivity between <span class="inline_code">0.0</span> and <span class="inline_code">1.0</span>.&nbsp;The sentence can be a string, <span class="inline_code">Text</span>, <span class="inline_code">Sentence</span>, <span class="inline_code">Chunk</span>,&nbsp;<span class="inline_code">Word</span> or a&nbsp;<span class="inline_code">Synset</span> (see below).&nbsp;</p>
<p>The <span class="inline_code">positive()</span> function returns <span class="inline_code">True</span> if the given sentence's polarity is above the threshold. The threshold can be lowered or raised, but overall <span class="inline_code">+0.1</span> gives the best results for product reviews. Accuracy is about 75% for movie reviews.</p>
<pre class="brush:python; gutter:false; light:true;">sentiment(sentence) # Returns a (polarity, subjectivity)-tuple.</pre><pre class="brush:python; gutter:false; light:true;">positive(s, threshold=0.1) # Returns True if polarity &gt;= threshold.</pre><div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.en import sentiment
&gt;&gt;&gt;
&gt;&gt;&gt; print sentiment(
&gt;&gt;&gt; "The movie attempts to be surreal by incorporating various time paradoxes,"
&gt;&gt;&gt; "but it's presented in such a ridiculous way it's seriously boring.")
(-0.34, 1.0) </pre></div>
<p>In the example above,&nbsp;<span class="inline_code">-0.34</span> is the average of&nbsp;<em>surreal</em>, <em>various</em>, <em>ridiculous</em> and <em>seriously boring</em>.&nbsp;To retrieve the scores for individual words, use the special <span class="inline_code">assessments</span> property, which yields a list of <span class="inline_code">(words,</span> <span class="inline_code">polarity,</span> <span class="inline_code">subjectivity,</span> <span class="inline_code">label)</span>-tuples.</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; print sentiment('Wonderfully awful! :-)').assessments
[(['wonderfully', 'awful', '!'], -1.0, 1.0, None),
([':-)'], 0.5, 1.0, 'mood')]
</pre></div>
<p>&nbsp;&nbsp;</p>
<hr />
<h2><a name="modality"></a>Mood &amp; modality</h2>
<p>Grammatical mood refers to the use of auxiliary verbs (e.g., <em>could</em>, <em>would</em>) and adverbs (e.g., <em>definitely</em>,<em> maybe</em>) to express uncertainty.&nbsp;</p>
<p>The <span class="inline_code">mood()</span> function returns either&nbsp;<span class="inline_code">INDICATIVE</span>, <span class="inline_code">IMPERATIVE</span>, <span class="inline_code">CONDITIONAL</span>&nbsp;or <span class="inline_code">SUBJUNCTIVE</span>&nbsp;for a given parsed&nbsp;<span class="inline_code">Sentence</span>. See the table below for an overview of moods.</p>
<p>The <span class="inline_code">modality()</span> function returns the degree of certainty as a value between <span class="inline_code">-1.0</span> and <span class="inline_code">+1.0</span>, where values <span class="inline_code">&gt;</span> <span class="inline_code">+0.5</span> represent facts. For example, "<em>I wish it would stop raining"</em> scores <span class="inline_code">-0.35</span>, whereas "<em>It will stop raining"</em> scores <span class="inline_code">+0.75</span>. Accuracy is about 68% for Wikipedia texts.</p>
<pre class="brush:python; gutter:false; light:true;">mood(sentence) # Returns INDICATIVE | IMPERATIVE | CONDITIONAL | SUBJUNCTIVE</pre><pre class="brush:python; gutter:false; light:true;">modality(sentence) # Returns -1.0 =&gt; +1.0.</pre><table class="border">
<tbody>
<tr>
<td><span class="smallcaps">Mood</span></td>
<td><span class="smallcaps">Form</span></td>
<td><span class="smallcaps">Use</span></td>
<td><span class="smallcaps">Example</span></td>
</tr>
<tr>
<td><span class="inline_code">INDICATIVE</span></td>
<td>none of the below&nbsp;</td>
<td>fact, belief</td>
<td><em>It rains.</em></td>
</tr>
<tr>
<td><span class="inline_code">IMPERATIVE</span></td>
<td>infinitive without <em>to</em></td>
<td>command, warning</td>
<td><em><span style="text-decoration: underline;">Do</span>n't rain!</em></td>
</tr>
<tr>
<td><span class="inline_code">CONDITIONAL</span></td>
<td><em>would</em>, <em>could</em>, <em>should</em>, <em>may</em>, or <em>will</em>,&nbsp;<em>can</em> + <em>if</em></td>
<td>conjecture</td>
<td><em>It <span style="text-decoration: underline;">might</span> rain.</em></td>
</tr>
<tr>
<td><span class="inline_code">SUBJUNCTIVE</span></td>
<td><em>wish</em>, <em>were</em>, or&nbsp;<em>it is</em> + infinitive</td>
<td>wish, opinion</td>
<td><em>I <span style="text-decoration: underline;">hope</span> it rains.</em></td>
</tr>
</tbody>
</table>
<p>For example:</p>
<div class="example">
<pre class="brush: python;gutter: false; light: true; fontsize: 100; first-line: 1; ">&gt;&gt;&gt; from pattern.en import parse, Sentence, parse
&gt;&gt;&gt; from pattern.en import modality
&gt;&gt;&gt;
&gt;&gt;&gt; s = "Some amino acids tend to be acidic while others may be basic." # weaseling
&gt;&gt;&gt; s = parse(s, lemmata=True)
&gt;&gt;&gt; s = Sentence(s)
&gt;&gt;&gt;
&gt;&gt;&gt; print modality(s)
0.11</pre></div>
<p>&nbsp;</p>
<hr />
<h2><a name="wordnet"></a>WordNet</h2>
<p>The pattern.en.wordnet module includes WordNet 3.0 and Oliver Steele's PyWordNet module. <a href="http://wordnet.princeton.edu/" target="_blank">WordNet</a> is a lexical database that groups related words into <span class="inline_code">Synset</span> objects (= sets of synonyms). Each synset provides a short definition and semantic relations to other synsets.</p>
<p>The <span class="inline_code">synsets()</span> function returns a list of <span class="inline_code">Synset</span> objects for a given word, where each set corresponds to a word sense (e.g., <em>tree</em> in the sense of plant, <em>tree</em> in the sense of diagram, etc.)</p>
<pre class="brush:python; gutter:false; light:true;">synset = wordnet.synsets(word, pos=NOUN)[i]</pre><pre class="brush:python; gutter:false; light:true;">synset.pos # Part-of-speech: NOUN | VERB | ADJECTIVE | ADVERB.
synset.synonyms # List of word forms (i.e., synonyms).
synset.gloss # Definition string.
synset.lexname # Category string, or None.
synset.ic # Information Content (float).
</pre><pre class="brush:python; gutter:false; light:true;">synset.antonym # Synset (semantic opposite).
synset.hypernym # Synset (semantic parent).</pre><pre class="brush:python; gutter:false; light:true;">synset.hypernyms(recursive=False, depth=None)
synset.hyponyms(recursive=False, depth=None)
synset.meronyms() # List of synsets (members/parts).
synset.holonyms() # List of synsets (of which this is a member).
synset.similar() # List of synsets (similar adjectives/verbs).</pre><ul>
<li><span class="inline_code">Synset.hypernyms()</span> returns a list of <em>&nbsp;</em>parent synsets (i.e., more general).</li>
<li><span class="inline_code">Synset.hyponyms()</span> returns a list child synsets (i.e., more specific).<br />With <span class="inline_code">recursive=True</span>, returns parents of parents or children of children.<br />Optionally, returns parents or children recursively up to the given <span class="inline_code">depth</span>.</li>
</ul>
<p>For example:</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.en import wordnet
&gt;&gt;&gt;
&gt;&gt;&gt; s = wordnet.synsets('bird')[0]
&gt;&gt;&gt;
&gt;&gt;&gt; print 'Definition:', s.gloss
&gt;&gt;&gt; print ' Synonyms:', s.synonyms
&gt;&gt;&gt; print ' Hypernyms:', s.hypernyms()
&gt;&gt;&gt; print ' Hyponyms:', s.hyponyms()
&gt;&gt;&gt; print ' Holonyms:', s.holonyms()
&gt;&gt;&gt; print ' Meronyms:', s.meronyms()
Definition: u'warm-blooded egg-laying vertebrates characterized '
'by feathers and forelimbs modified as wings'
Synonyms: [u'bird']
Hypernyms: [Synset(u'vertebrate')]
Hyponyms: [Synset(u'cock'), Synset(u'hen'), ...]
Holonyms: [Synset(u'Aves'), Synset(u'flock')]
Meronyms: [Synset(u'beak'), Synset(u'feather'), ...]</pre></div>
<div class="example"><span class="small"><span style="text-decoration: underline;">Reference</span>: Fellbaum, C. (1998). </span><em class="small">WordNet: An Electronic Lexical Database</em><span class="small">. Cambridge, MIT Press.</span></div>
<h3>Synset similarity</h3>
<p>The <span class="inline_code">ancestor()</span> function returns the common ancestor&nbsp;of two synsets.&nbsp;The <span class="inline_code">similarity()</span> function returns the semantic similarity of two synsets as a value between <span class="inline_code">0.0</span><span class="inline_code">1.0</span>.</p>
<pre class="brush:python; gutter:false; light:true;">wordnet.ancestor(synset1, synset2)</pre><pre class="brush:python; gutter:false; light:true;">wordnet.similarity(synset1, synset2)
</pre><div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.en import wordnet
&gt;&gt;&gt;
&gt;&gt;&gt; a = wordnet.synsets('cat')[0]
&gt;&gt;&gt; b = wordnet.synsets('dog')[0]
&gt;&gt;&gt; c = wordnet.synsets('box')[0]
&gt;&gt;&gt;
&gt;&gt;&gt; print wordnet.ancestor(a, b)
&gt;&gt;&gt;
&gt;&gt;&gt; print wordnet.similarity(a, a)
&gt;&gt;&gt; print wordnet.similarity(a, b)
&gt;&gt;&gt; print wordnet.similarity(a, c)
Synset('carnivore')
1.0
0.86
0.17 </pre></div>
<p>Similarity is calculated using Lin's formula and Resnik's Information Content (IC). IC values for each synset are derived from the word count in Brown corpus.</p>
<p><span class="inline_code">lin</span> <span class="inline_code">=</span> <span class="inline_code">2.0</span> <span class="inline_code">*</span> <span class="inline_code">log(ancestor(synset1,</span> <span class="inline_code">synset2).ic)</span> <span class="inline_code">/</span> <span class="inline_code">log(synset1.ic</span> <span class="inline_code">*</span> <span class="inline_code">synset2.ic)</span></p>
<h3>Synset sentiment</h3>
<p><a href="http://sentiwordnet.isti.cnr.it/" target="_blank">SentiWordNet</a> is a lexical resource for opinion mining, with polarity and subjectivity scores for all WordNet synsets. SentiWordNet is free for non-commercial research purposes. To use SentiWordNet, request a download from the authors and put&nbsp;<span class="inline_code">SentiWordNet*.txt</span> in&nbsp;<span class="inline_code">pattern/en/wordnet/</span>.&nbsp;You can then use&nbsp;<span class="inline_code">Synset.weight()</span> in your script:</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.en import wordnet
&gt;&gt;&gt; from pattern.en import ADJECTIVE
&gt;&gt;&gt;
&gt;&gt;&gt; print wordnet.synsets('happy', ADJECTIVE)[0].weight
&gt;&gt;&gt; print wordnet.synsets('sad', ADJECTIVE)[0].weight
(0.375, 0.875)
(-0.625, 0.875)
</pre></div>
<p>&nbsp;</p>
<hr />
<h2><a name="wordlist"></a>Wordlists</h2>
<p>The patten.en module includes a number of general-purpose word lists:</p>
<table class="border">
<tbody>
<tr>
<td><span class="smallcaps">List</span></td>
<td><span class="smallcaps">Description</span></td>
<td style="text-align: center;"><span class="smallcaps">Size</span></td>
<td><span class="smallcaps">Example</span></td>
</tr>
<tr>
<td><span class="inline_code">ACADEMIC</span></td>
<td>English academic words</td>
<td style="text-align: center;">500</td>
<td><em>criterion</em>, <em>proportionally</em>, <em>research</em></td>
</tr>
<tr>
<td><span class="inline_code">BASIC</span></td>
<td>English basic words</td>
<td style="text-align: center;">1,000</td>
<td><em>chicken</em>, <em>pain</em>, <em>road</em></td>
</tr>
<tr>
<td><span class="inline_code">PROFANITY</span></td>
<td>English swear words</td>
<td style="text-align: center;">350</td>
<td>&nbsp;</td>
</tr>
<tr>
<td><span class="inline_code">TIME</span></td>
<td>English time &amp; date words</td>
<td style="text-align: center;">100</td>
<td><em>Christmas</em>, <em>past</em>, <em>saturday</em></td>
</tr>
</tbody>
</table>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.en.wordlist import ACADEMIC
&gt;&gt;&gt;
&gt;&gt;&gt; words = open('paper.txt').read().split()
&gt;&gt;&gt; words = [w for w in words if w not in ACADEMIC] </pre></div>
<p>&nbsp;</p>
<hr />
<h2>See also</h2>
<ul>
<li><a href="http://www.clips.ua.ac.be/pages/MBSP" target="_blank">MBSP</a> (GPL): r<span>obust parser using a memory-based learning approach, in Python.</span></li>
<li><span><a href="http://www.nltk.org/" target="_blank">NLTK</a> (Apache): f</span><span>ull natural language processing toolkit for Python.</span></li>
</ul>
</div>
</div></div>
</div>
</div>
</div>
</div>
</div>
</div>
<script>
SyntaxHighlighter.all();
</script>
</body>
</html>

@ -0,0 +1,579 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<title>pattern-es</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link type="text/css" rel="stylesheet" href="../clips.css" />
<style>
/* Small fixes because we omit the online layout.css. */
h3 { line-height: 1.3em; }
#page { margin-left: auto; margin-right: auto; }
#header, #header-inner { height: 175px; }
#header { border-bottom: 1px solid #C6D4DD; }
table { border-collapse: collapse; }
#checksum { display: none; }
</style>
<link href="../js/shCore.css" rel="stylesheet" type="text/css" />
<link href="../js/shThemeDefault.css" rel="stylesheet" type="text/css" />
<script language="javascript" src="../js/shCore.js"></script>
<script language="javascript" src="../js/shBrushXml.js"></script>
<script language="javascript" src="../js/shBrushJScript.js"></script>
<script language="javascript" src="../js/shBrushPython.js"></script>
</head>
<body class="node-type-page one-sidebar sidebar-right section-pages">
<div id="page">
<div id="page-inner">
<div id="header"><div id="header-inner"></div></div>
<div id="content">
<div id="content-inner">
<div class="node node-type-page"
<div class="node-inner">
<div class="breadcrumb">View online at: <a href="http://www.clips.ua.ac.be/pages/pattern-es" class="noexternal" target="_blank">http://www.clips.ua.ac.be/pages/pattern-es</a></div>
<h1>pattern.es</h1>
<!-- Parsed from the online documentation. -->
<div id="node-1626" class="node node-type-page"><div class="node-inner">
<div class="content">
<p><span class="big">The pattern.es module contains a fast part-of-speech tagger for Spanish (identifies nouns, adjectives, verbs, etc. in a sentence) and tools for Spanish verb conjugation and noun singularization &amp; pluralization.</span></p>
<p>It can be used by itself or with other&nbsp;<a href="pattern.html">pattern</a>&nbsp;modules:&nbsp;<a href="pattern-web.html">web</a>&nbsp;|&nbsp;<a href="pattern-db.html">db</a>&nbsp;| <a href="pattern-en.html">en</a>&nbsp;|&nbsp;<a href="pattern-search.html">search</a>&nbsp;|&nbsp;<a href="pattern-vector.html">vector</a>&nbsp;|&nbsp;<a href="pattern-graph.html">graph</a>.</p>
<p><img src="../g/pattern_schema_es.gif" alt="" width="620" height="180" /></p>
<hr />
<h2>Documentation</h2>
<p>The functions in this module take the same parameters and return the same values as their counterparts in <a href="pattern-en.html">pattern.en</a>. Refer to the documentation there for more details.&nbsp;&nbsp;</p>
<h3>Noun singularization &amp; pluralization</h3>
<p>For Spanish nouns there is <span class="inline_code">singularize()</span> and <span class="inline_code">pluralize()</span>.&nbsp;The implementation is slightly less robust than the English version (accuracy 94% for singularization and 78% for pluralization).</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.es import singularize, pluralize
&gt;&gt;&gt;
&gt;&gt;&gt; print singularize('gatos')
&gt;&gt;&gt; print pluralize('gato')
gato
gatos </pre></div>
<h3>Verb conjugation</h3>
<p>For Spanish verbs there is <span class="inline_code">conjugate()</span>, <span class="inline_code">lemma()</span>, <span class="inline_code">lexeme()</span> and <span class="inline_code">tenses()</span>.&nbsp;The lexicon for verb conjugation contains about 600 common Spanish verbs, composed by Fred Jehle. For unknown verbs it will fall back to a rule-based approach with an accuracy of about 84%.&nbsp;</p>
<p>Spanish verbs have more tenses than English verbs. In particular, the plural differs for each person, and there are additional forms for the&nbsp;<span class="inline_code">FUTURE</span>&nbsp;and&nbsp;<span class="inline_code">CONDITIONAL</span>&nbsp;tense, the&nbsp;<span class="inline_code">IMPERATIVE</span>&nbsp;and&nbsp;<span class="inline_code">SUBJUNCTIVE</span>&nbsp;mood and the&nbsp;<span class="inline_code">PERFECTIVE</span>&nbsp;aspect:</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.es import conjugate
&gt;&gt;&gt; from pattern.es import INFINITIVE, PRESENT, PAST, SG, SUBJUNCTIVE, PERFECTIVE
&gt;&gt;&gt;
&gt;&gt;&gt; print conjugate('soy', INFINITIVE)
&gt;&gt;&gt; print conjugate('soy', PRESENT, 1, SG, mood=SUBJUNCTIVE)
&gt;&gt;&gt; print conjugate('soy', PAST, 3, SG)
&gt;&gt;&gt; print conjugate('soy', PAST, 3, SG, aspect=PERFECTIVE)
ser
sea
era
fue </pre></div>
<p>For <span class="inline_code">PAST</span>&nbsp;tense + <span class="inline_code">PERFECTIVE</span>&nbsp;aspect we can also use <span class="inline_code">PRETERITE</span>. For <span class="inline_code">PAST</span>&nbsp;tense + <span class="inline_code">IMPERFECTIVE</span>&nbsp;aspect we can also use <span class="inline_code">IMPERFECT</span>:</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.es import conjugate
&gt;&gt;&gt; from pattern.es import IMPERFECT, PRETERITE
&gt;&gt;&gt;
&gt;&gt;&gt; print conjugate('soy', IMPERFECT, 3, SG)
&gt;&gt;&gt; print conjugate('soy', PRETERITE, 3, SG)
era
fue </pre></div>
<p>&nbsp;The <span class="inline_code">conjugate()</span> function takes the following optional parameters:</p>
<table class="border">
<tbody>
<tr>
<td class="smallcaps">Tense</td>
<td class="smallcaps">Person</td>
<td class="smallcaps">Number</td>
<td class="smallcaps">Mood</td>
<td class="smallcaps">Aspect</td>
<td class="smallcaps">Alias</td>
<td class="smallcaps">Example</td>
</tr>
<tr>
<td class="inline_code">INFINITVE</td>
<td class="inline_code">None</td>
<td class="inline_code">None</td>
<td class="inline_code">None</td>
<td class="inline_code">None</td>
<td class="inline_code">"inf"</td>
<td><em>ser</em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">1</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1sg"</td>
<td><em>yo&nbsp;<span style="text-decoration: underline;">soy</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">2</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2sg"</td>
<td><em>&nbsp;<span style="text-decoration: underline;">eres</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">3</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3sg"</td>
<td><em>el&nbsp;<span style="text-decoration: underline;">es</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">1</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1pl"</td>
<td><em>nosotros&nbsp;<span style="text-decoration: underline;">somos</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">2</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2pl"</td>
<td><em>vosotros&nbsp;<span style="text-decoration: underline;">sois</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">3</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3pl"</td>
<td><em>ellos&nbsp;<span style="text-decoration: underline;">son</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">None</td>
<td class="inline_code">None</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">PROGRESSIVE</td>
<td class="inline_code">"part"</td>
<td><em>siendo</em></td>
</tr>
<tr>
<td style="border-left: 0; border-right: 0; padding: 0;">&nbsp;</td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">2</td>
<td class="inline_code">SG</td>
<td class="inline_code">IMPERATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2sg!"</td>
<td><em></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">2</td>
<td class="inline_code">PL</td>
<td class="inline_code">IMPERATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2pl!"</td>
<td><em>sed</em></td>
</tr>
<tr>
<td style="border-left: 0; border-right: 0; padding: 0;">&nbsp;</td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">1</td>
<td class="inline_code">SG</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1sg?"</td>
<td><em>yo&nbsp;<span style="text-decoration: underline;">sea</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">2</td>
<td class="inline_code">SG</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2sg?"</td>
<td><em>&nbsp;<span style="text-decoration: underline;">seas</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">3</td>
<td class="inline_code">SG</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3sg?"</td>
<td><em>el&nbsp;<span style="text-decoration: underline;">sea</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">1</td>
<td class="inline_code">PL</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1pl?"</td>
<td><em>nosotros&nbsp;<span style="text-decoration: underline;">seamos</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">2</td>
<td class="inline_code">PL</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2pl?"</td>
<td><em>vosotros&nbsp;<span style="text-decoration: underline;">seáis</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">3</td>
<td class="inline_code">PL</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3pl?"</td>
<td><em>ellos&nbsp;<span style="text-decoration: underline;">sean</span></em></td>
</tr>
<tr>
<td style="border-left: 0; border-right: 0; padding: 0;">&nbsp;</td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">1</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1sgp"</td>
<td><em>yo&nbsp;<span style="text-decoration: underline;">era</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">2</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2sgp"</td>
<td><em>&nbsp;<span style="text-decoration: underline;">eras</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">3</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3sgp"</td>
<td><em>el&nbsp;<span style="text-decoration: underline;">era</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">1</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1ppl"</td>
<td><em>nosotros&nbsp;<span style="text-decoration: underline;">éramos</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">2</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2ppl"</td>
<td><em>vosotros&nbsp;<span style="text-decoration: underline;">erais</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">3</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3ppl"</td>
<td><em>ellos&nbsp;<span style="text-decoration: underline;">eran</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">None</td>
<td class="inline_code">None</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">PROGRESSIVE</td>
<td class="inline_code">"ppart"</td>
<td><em>sido</em></td>
</tr>
<tr>
<td style="border-left: 0; border-right: 0; padding: 0;">&nbsp;</td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">1</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">PERFECTIVE</td>
<td class="inline_code">"1sgp+"</td>
<td><em>yo&nbsp;<span style="text-decoration: underline;">fui</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">2</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">PERFECTIVE</td>
<td class="inline_code">"2sgp+"</td>
<td><em>&nbsp;<span style="text-decoration: underline;">fuiste</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">3</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">PERFECTIVE</td>
<td class="inline_code">"3sgp+"</td>
<td><em>el&nbsp;<span style="text-decoration: underline;">fue</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">1</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">PERFECTIVE</td>
<td class="inline_code">"1ppl+"</td>
<td><em>nosotros&nbsp;<span style="text-decoration: underline;">fuimos</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">2</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">PERFECTIVE</td>
<td class="inline_code">"2ppl+"</td>
<td><em>vosotros&nbsp;<span style="text-decoration: underline;">fuisteis</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">3</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">PERFECTIVE</td>
<td class="inline_code">"3ppl+"</td>
<td><em>ellos&nbsp;<span style="text-decoration: underline;">fueron</span></em></td>
</tr>
<tr>
<td style="border-left: 0; border-right: 0; padding: 0;">&nbsp;</td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">1</td>
<td class="inline_code">SG</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1sgp?"</td>
<td><em>yo&nbsp;<span style="text-decoration: underline;">fuera</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">2</td>
<td class="inline_code">SG</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2sgp?"</td>
<td><em>&nbsp;<span style="text-decoration: underline;">fueras</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">3</td>
<td class="inline_code">SG</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3sgp?"</td>
<td><em>el&nbsp;<span style="text-decoration: underline;">fuera</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">1</td>
<td class="inline_code">PL</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1ppl?"</td>
<td><em>nosotros&nbsp;<span style="text-decoration: underline;">fuéramos</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">2</td>
<td class="inline_code">PL</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2ppl?"</td>
<td><em>vosotros&nbsp;<span style="text-decoration: underline;">fuerais</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">3</td>
<td class="inline_code">PL</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3ppl?"</td>
<td><em>ellos&nbsp;<span style="text-decoration: underline;">fueran</span></em></td>
</tr>
<tr>
<td style="border-left: 0; border-right: 0; padding: 0;">&nbsp;</td>
</tr>
<tr>
<td class="inline_code">FUTURE</td>
<td class="inline_code">1</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1sgf"</td>
<td><em>yo&nbsp;<span style="text-decoration: underline;">seré</span></em></td>
</tr>
<tr>
<td class="inline_code">FUTURE</td>
<td class="inline_code">2</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2sgf"</td>
<td><em>&nbsp;<span style="text-decoration: underline;">serás</span></em></td>
</tr>
<tr>
<td class="inline_code">FUTURE</td>
<td class="inline_code">3</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3sgf"</td>
<td><em>el&nbsp;<span style="text-decoration: underline;">será</span></em></td>
</tr>
<tr>
<td class="inline_code">FUTURE</td>
<td class="inline_code">1</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1plf"</td>
<td><em>nosotros&nbsp;<span style="text-decoration: underline;">seremos</span></em></td>
</tr>
<tr>
<td class="inline_code">FUTURE</td>
<td class="inline_code">2</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2plf"</td>
<td><em>vosotros&nbsp;<span style="text-decoration: underline;">seréis</span></em></td>
</tr>
<tr>
<td class="inline_code">FUTURE</td>
<td class="inline_code">3</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3plf"</td>
<td><em>ellos&nbsp;<span style="text-decoration: underline;">serán</span></em></td>
</tr>
<tr>
<td style="border-left: 0; border-right: 0; padding: 0;">&nbsp;</td>
</tr>
<tr>
<td class="inline_code">CONDITIONAL</td>
<td class="inline_code">1</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1sg-&gt;"</td>
<td><em>yo&nbsp;<span style="text-decoration: underline;">sería</span></em></td>
</tr>
<tr>
<td class="inline_code">CONDITIONAL</td>
<td class="inline_code">2</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2sg-&gt;"</td>
<td><em>&nbsp;<span style="text-decoration: underline;">serías</span></em></td>
</tr>
<tr>
<td class="inline_code">CONDITIONAL</td>
<td class="inline_code">3</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3sg-&gt;"</td>
<td><em>el&nbsp;<span style="text-decoration: underline;">sería</span></em></td>
</tr>
<tr>
<td class="inline_code">CONDITIONAL</td>
<td class="inline_code">1</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1pl-&gt;"</td>
<td><em>nosotros&nbsp;<span style="text-decoration: underline;">seríamos</span></em></td>
</tr>
<tr>
<td class="inline_code">CONDITIONAL</td>
<td class="inline_code">2</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2pl-&gt;"</td>
<td><em>vosotros&nbsp;<span style="text-decoration: underline;">seríais</span></em></td>
</tr>
<tr>
<td class="inline_code">CONDITIONAL</td>
<td class="inline_code">3</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3pl-&gt;"</td>
<td><em>ellos&nbsp;<span style="text-decoration: underline;">serían</span></em></td>
</tr>
</tbody>
</table>
<p>Instead of optional parameters, a single short alias, or&nbsp;<span class="inline_code">PARTICIPLE</span> or <span class="inline_code">PAST+PARTICIPLE</span> can also be given. With no parameters, the infinitive form of the verb is returned.</p>
<p class="small"><span style="text-decoration: underline;">Reference</span><span>: Jehle, F. (2012).&nbsp;<em>Spanish Verb Forms</em>. Retrieved from:&nbsp;</span><span><a class="noexternal" style="color: inherit;" href="http://users.ipfw.edu/jehle/verblist.htm" target="_blank">http://users.ipfw.edu/jehle/verblist.htm</a>.</span></p>
<h3>Attributive &amp; predicative adjectives&nbsp;</h3>
<p>Spanish adjectives inflect with an <span class="inline_code">-o</span>,&nbsp;<span class="inline_code">-a</span>&nbsp;, <span class="inline_code">-os</span>, <span class="inline_code">-as</span>, or <span class="inline_code">-es</span> suffix (e.g., <em>curioso</em>&nbsp;<em>los gatos curiosos</em>) depending on gender. You can get the base form with the <span class="inline_code">predicative()</span> function, or vice versa with&nbsp;<span class="inline_code">attributive()</span>.&nbsp;For predicative, a statistical approach is used with an accuracy of 93%. For attributive, you need to supply gender (<span class="inline_code">MALE</span>, <span class="inline_code">FEMALE</span>, <span class="inline_code">NEUTRAL</span>&nbsp;and/or <span class="inline_code">PLURAL</span>).</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.es import attributive, predicative
&gt;&gt;&gt; from pattern.es import FEMALE, PLURAL
&gt;&gt;&gt;
&gt;&gt;&gt; print predicative('curiosos')
&gt;&gt;&gt; print attributive('curioso', gender=FEMALE)
&gt;&gt;&gt; print attributive('curioso', gender=FEMALE+PLURAL)
curioso
curiosa
curiosas </pre></div>
<h3>Parser</h3>
<p>For parsing there is <span class="inline_code" style="font-family: Courier, monospace; font-size: 12px;">parse()</span>, <span class="inline_code">parsetree()</span> and&nbsp;<span class="inline_code" style="font-family: Courier, monospace; font-size: 12px;">split()</span>. The <span class="inline_code">parse()</span> function annotates words in the given string with their part-of-speech <a class="link-maintenance" href="mbsp-tags.html">tags</a>&nbsp;(e.g., <span class="postag">NN</span> for nouns and <span class="postag">VB</span> for verbs). The <span class="inline_code">parsetree()</span> function takes a string and returns a tree of nested objects (<span class="inline_code">Text</span>&nbsp;<span class="inline_code">Sentence</span>&nbsp;<span class="inline_code">Chunk</span>&nbsp;<span class="inline_code">Word</span>). The <span class="inline_code">split()</span> function takes the output of <span class="inline_code">parse()</span> and returns a <span class="inline_code">Text</span>.&nbsp;See the <span class="inline_code">pattern.en</span> documentation&nbsp;(<span class="link-maintenance" style="color: #78aaff;"><a style="color: #8caaff; outline-style: none !important; outline-width: initial !important; outline-color: initial !important;" href="pattern-en.html#tree">here</a></span>) how to manipulate <span class="inline_code">Text</span>&nbsp;objects.&nbsp;</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.es import parse, split
&gt;&gt;&gt;
&gt;&gt;&gt; s = parse('El gato negro se sienta en la estera.')
&gt;&gt;&gt; for sentence in split(s):
&gt;&gt;&gt; print sentence
Sentence('El/DT/B-NP/O gato/NN/I-NP/O negro/JJ/I-NP/O'
'se/PRP/B-NP/O sienta/VB/B-VP/O'
'en/IN/B-PP/B-PNP la/DT/B-NP/I-PNP estera/NN/I-NP/I-PNP ././O/O')</pre></div>
<p>The parser is trained on the Spanish portion of <a href="http://www.lsi.upc.edu/~nlp/wikicorpus/" target="_blank">Wikicorpus </a>&nbsp;using 1.5M words from the tagged sections 10,00015,000. The accuracy is around 92%.&nbsp;The original <a href="http://www.lsi.upc.edu/~nlp/SVMTool/parole.html" target="_blank">Parole</a>&nbsp;tagset is mapped to <a href="mbsp-tags.html">Penn Treebank</a> tagset. If you need to work with the original tags you can also use&nbsp;<span class="inline_code">parse()</span> with an optional parameter <span class="inline_code">tagset="parole"</span>.</p>
<p class="small"><span style="text-decoration: underline;">Reference</span>:&nbsp;Reese, S., Boleda, G., Cuadros, M., Padró, L., Rigau, G (2010).&nbsp;<br />Wikicorpus: A Word-Sense Disambiguated Multilingual Wikipedia Corpus.&nbsp;<em>Proceedings of LREC'10</em>.&nbsp;</p>
<h3>Sentiment analysis</h3>
<p>There's no&nbsp;<span class="inline_code">sentiment()</span> function for Spanish yet.</p>
</div>
</div></div>
</div>
</div>
</div>
</div>
</div>
</div>
<script>
SyntaxHighlighter.all();
</script>
</body>
</html>

@ -0,0 +1,590 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<title>pattern-fr</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link type="text/css" rel="stylesheet" href="../clips.css" />
<style>
/* Small fixes because we omit the online layout.css. */
h3 { line-height: 1.3em; }
#page { margin-left: auto; margin-right: auto; }
#header, #header-inner { height: 175px; }
#header { border-bottom: 1px solid #C6D4DD; }
table { border-collapse: collapse; }
#checksum { display: none; }
</style>
<link href="../js/shCore.css" rel="stylesheet" type="text/css" />
<link href="../js/shThemeDefault.css" rel="stylesheet" type="text/css" />
<script language="javascript" src="../js/shCore.js"></script>
<script language="javascript" src="../js/shBrushXml.js"></script>
<script language="javascript" src="../js/shBrushJScript.js"></script>
<script language="javascript" src="../js/shBrushPython.js"></script>
</head>
<body class="node-type-page one-sidebar sidebar-right section-pages">
<div id="page">
<div id="page-inner">
<div id="header"><div id="header-inner"></div></div>
<div id="content">
<div id="content-inner">
<div class="node node-type-page"
<div class="node-inner">
<div class="breadcrumb">View online at: <a href="http://www.clips.ua.ac.be/pages/pattern-fr" class="noexternal" target="_blank">http://www.clips.ua.ac.be/pages/pattern-fr</a></div>
<h1>pattern.fr</h1>
<!-- Parsed from the online documentation. -->
<div id="node-1697" class="node node-type-page"><div class="node-inner">
<div class="content">
<p><span class="big">The pattern.fr module contains a fast part-of-speech tagger for French (identifies nouns, adjectives, verbs, etc. in a sentence), sentiment analysis, and tools for French verb conjugation and noun singularization &amp; pluralization.</span></p>
<p>It can be used by itself or with other&nbsp;<a href="pattern.html">pattern</a>&nbsp;modules:&nbsp;<a href="pattern-web.html">web</a>&nbsp;|&nbsp;<a href="pattern-db.html">db</a>&nbsp;| <a href="pattern-en.html">en</a>&nbsp;|&nbsp;<a href="pattern-search.html">search</a>&nbsp;|&nbsp;<a href="pattern-vector.html">vector</a>&nbsp;|&nbsp;<a href="pattern-graph.html">graph</a>.</p>
<p><img src="../g/pattern_schema_fr.gif" alt="" /></p>
<hr />
<h2>Documentation</h2>
<p>The functions in this module take the same parameters and return the same values as their counterparts in <a href="pattern-en.html">pattern.en</a>. Refer to the documentation there for more details.&nbsp;&nbsp;</p>
<h3>Noun singularization &amp; pluralization</h3>
<p>For French nouns there is <span class="inline_code">singularize()</span> and <span class="inline_code">pluralize()</span>.&nbsp;The implementation uses a statistical approach with 93% accuracy for singularization and 92% for pluralization.</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.fr import singularize, pluralize
&gt;&gt;&gt;
&gt;&gt;&gt; print singularize('chats')
&gt;&gt;&gt; print pluralize('chat')
chat
chats </pre></div>
<h3>Verb conjugation</h3>
<p>For French verbs there is <span class="inline_code">conjugate()</span>, <span class="inline_code">lemma()</span>, <span class="inline_code">lexeme()</span> and <span class="inline_code">tenses()</span>.&nbsp;The lexicon for verb conjugation contains about 1,750 common French verbs (constructed with Bob Salita's verb conjugation rules).&nbsp;For unknown verbs it will fall back to regular expressions with an accuracy of about 83%.&nbsp;</p>
<p>French verbs have more tenses than English verbs. In particular, the plural differs for each person, and there are additional forms for the&nbsp;<span class="inline_code">FUTURE</span>&nbsp;tense, the&nbsp;<span class="inline_code">IMPERATIVE</span>, <span class="inline_code">CONDITIONAL</span> and&nbsp;<span class="inline_code">SUBJUNCTIVE</span>&nbsp;mood and the&nbsp;<span class="inline_code">PERFECTIVE</span>&nbsp;aspect:</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.fr import conjugate
&gt;&gt;&gt; from pattern.fr import INFINITIVE, PRESENT, PAST, SG, SUBJUNCTIVE, PERFECTIVE
&gt;&gt;&gt;
&gt;&gt;&gt; print conjugate('suis', INFINITIVE)
&gt;&gt;&gt; print conjugate('suis', PRESENT, 1, SG, mood=SUBJUNCTIVE)
&gt;&gt;&gt; print conjugate('suis', PAST, 3, SG)
&gt;&gt;&gt; print conjugate('suis', PAST, 3, SG, aspect=PERFECTIVE)
être
sois
était
fut </pre></div>
<p>For <span class="inline_code">PAST</span>&nbsp;tense + <span class="inline_code">PERFECTIVE</span>&nbsp;aspect we can also use <span class="inline_code">PRETERITE</span>&nbsp;(<em>passé simple</em>). For <span class="inline_code">PAST</span>&nbsp;tense + <span class="inline_code">IMPERFECTIVE</span>&nbsp;aspect we can also use <span class="inline_code">IMPERFECT</span>&nbsp;(<em>imparfait</em>):</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.fr import conjugate
&gt;&gt;&gt; from pattern.fr import IMPERFECT, PRETERITE
&gt;&gt;&gt;
&gt;&gt;&gt; print conjugate('suis', IMPERFECT, 3, SG)
&gt;&gt;&gt; print conjugate('suis', PRETERITE, 3, SG)
était
fut </pre></div>
<p>&nbsp;The <span class="inline_code">conjugate()</span> function takes the following optional parameters:</p>
<table class="border">
<tbody>
<tr>
<td class="smallcaps">Tense</td>
<td class="smallcaps">Person</td>
<td class="smallcaps">Number</td>
<td class="smallcaps">Mood</td>
<td class="smallcaps">Aspect</td>
<td class="smallcaps">Alias</td>
<td class="smallcaps">Example</td>
</tr>
<tr>
<td class="inline_code">INFINITVE</td>
<td class="inline_code">None</td>
<td class="inline_code">None</td>
<td class="inline_code">None</td>
<td class="inline_code">None</td>
<td class="inline_code">"inf"</td>
<td><em>être</em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">1</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1sg"</td>
<td><em>je&nbsp;<span style="text-decoration: underline;">suis</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">2</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2sg"</td>
<td><em>tu&nbsp;<span style="text-decoration: underline;">es</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">3</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3sg"</td>
<td><em>il&nbsp;<span style="text-decoration: underline;">est</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">1</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1pl"</td>
<td><em>nous&nbsp;<span style="text-decoration: underline;">sommes</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">2</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2pl"</td>
<td><em>vous&nbsp;<span style="text-decoration: underline;">êtes</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">3</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3pl"</td>
<td><em>ils&nbsp;<span style="text-decoration: underline;">sont</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">None</td>
<td class="inline_code">None</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">PROGRESSIVE</td>
<td class="inline_code">"part"</td>
<td><em>étant</em></td>
</tr>
<tr>
<td style="border-left: 0; border-right: 0; padding: 0;">&nbsp;</td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">2</td>
<td class="inline_code">SG</td>
<td class="inline_code">IMPERATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2sg!"</td>
<td><em>sois</em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">1</td>
<td class="inline_code">PL</td>
<td class="inline_code">IMPERATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1pl!"</td>
<td><em>soyons</em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">2</td>
<td class="inline_code">PL</td>
<td class="inline_code">IMPERATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2pl!"</td>
<td><em>soyez</em></td>
</tr>
<tr>
<td style="border-left: 0; border-right: 0; padding: 0;">&nbsp;</td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">1</td>
<td class="inline_code">SG</td>
<td class="inline_code">CONDITIONAL</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1sg-&gt;"</td>
<td><em>je&nbsp;<span style="text-decoration: underline;">serais</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">2</td>
<td class="inline_code">SG</td>
<td class="inline_code">CONDITIONAL</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2sg-&gt;"</td>
<td><em>tu&nbsp;<span style="text-decoration: underline;">serais</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">3</td>
<td class="inline_code">SG</td>
<td class="inline_code">CONDITIONAL</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3sg-&gt;"</td>
<td><em>il&nbsp;<span style="text-decoration: underline;">serait</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">1</td>
<td class="inline_code">PL</td>
<td class="inline_code">CONDITIONAL</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1pl-&gt;"</td>
<td><em>nous&nbsp;<span style="text-decoration: underline;">serions</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">2</td>
<td class="inline_code">PL</td>
<td class="inline_code">CONDITIONAL</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2pl-&gt;"</td>
<td><em>vous&nbsp;<span style="text-decoration: underline;">seriez</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">3</td>
<td class="inline_code">PL</td>
<td class="inline_code">CONDITIONAL</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3pl-&gt;"</td>
<td><em>ils&nbsp;<span style="text-decoration: underline;">seraient</span></em></td>
</tr>
<tr>
<td style="border-left: 0; border-right: 0; padding: 0;">&nbsp;</td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">1</td>
<td class="inline_code">SG</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1sg?"</td>
<td><em>je&nbsp;<span style="text-decoration: underline;">sois</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">2</td>
<td class="inline_code">SG</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2sg?"</td>
<td><em>tu&nbsp;<span style="text-decoration: underline;">sois</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">3</td>
<td class="inline_code">SG</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3sg?"</td>
<td><em>il&nbsp;<span style="text-decoration: underline;">soit</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">1</td>
<td class="inline_code">PL</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1pl?"</td>
<td><em>nous&nbsp;<span style="text-decoration: underline;">soyons</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">2</td>
<td class="inline_code">PL</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2pl?"</td>
<td><em>vous&nbsp;<span style="text-decoration: underline;">soyez</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">3</td>
<td class="inline_code">PL</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3pl?"</td>
<td><em>ils&nbsp;<span style="text-decoration: underline;">soient</span></em></td>
</tr>
<tr>
<td style="border-left: 0; border-right: 0; padding: 0;">&nbsp;</td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">1</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1sgp"</td>
<td><em>j'&nbsp;<span style="text-decoration: underline;">étais</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">2</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2sgp"</td>
<td><em>tu&nbsp;<span style="text-decoration: underline;">étais</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">3</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3sgp"</td>
<td><em>il&nbsp;<span style="text-decoration: underline;">était</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">1</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1ppl"</td>
<td><em>nous&nbsp;<span style="text-decoration: underline;">étions</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">2</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2ppl"</td>
<td><em>vous&nbsp;<span style="text-decoration: underline;">étiez</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">3</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3ppl"</td>
<td><em>ils&nbsp;<span style="text-decoration: underline;">étaient</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">None</td>
<td class="inline_code">None</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">PROGRESSIVE</td>
<td class="inline_code">"ppart"</td>
<td><em>été</em></td>
</tr>
<tr>
<td style="border-left: 0; border-right: 0; padding: 0;">&nbsp;</td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">1</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">PERFECTIVE</td>
<td class="inline_code">"1sgp+"</td>
<td><em>je&nbsp;<span style="text-decoration: underline;">fus</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">2</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">PERFECTIVE</td>
<td class="inline_code">"2sgp+"</td>
<td><em>tu&nbsp;<span style="text-decoration: underline;">fus</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">3</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">PERFECTIVE</td>
<td class="inline_code">"3sgp+"</td>
<td><em>il&nbsp;<span style="text-decoration: underline;">fut</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">1</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">PERFECTIVE</td>
<td class="inline_code">"1ppl+"</td>
<td><em>nous&nbsp;<span style="text-decoration: underline;">fûmes</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">2</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">PERFECTIVE</td>
<td class="inline_code">"2ppl+"</td>
<td><em>vous&nbsp;<span style="text-decoration: underline;">fûtes</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">3</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">PERFECTIVE</td>
<td class="inline_code">"3ppl+"</td>
<td><em>ils&nbsp;<span style="text-decoration: underline;">furent</span></em></td>
</tr>
<tr>
<td style="border-left: 0; border-right: 0; padding: 0;">&nbsp;</td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">1</td>
<td class="inline_code">SG</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1sgp?"</td>
<td><em>je&nbsp;<span style="text-decoration: underline;">fusse</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">2</td>
<td class="inline_code">SG</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2sgp?"</td>
<td><em>tu&nbsp;<span style="text-decoration: underline;">fusses</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">3</td>
<td class="inline_code">SG</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3sgp?"</td>
<td><em>il&nbsp;<span style="text-decoration: underline;">fût</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">1</td>
<td class="inline_code">PL</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1ppl?"</td>
<td><em>nous&nbsp;<span style="text-decoration: underline;">fussions</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">2</td>
<td class="inline_code">PL</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2ppl?"</td>
<td><em>vous&nbsp;<span style="text-decoration: underline;">fussiez</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">3</td>
<td class="inline_code">PL</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3ppl?"</td>
<td><em>ils&nbsp;<span style="text-decoration: underline;">fussent</span></em></td>
</tr>
<tr>
<td style="border-left: 0; border-right: 0; padding: 0;">&nbsp;</td>
</tr>
<tr>
<td class="inline_code">FUTURE</td>
<td class="inline_code">1</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1sgf"</td>
<td><em>je&nbsp;<span style="text-decoration: underline;">serai</span></em></td>
</tr>
<tr>
<td class="inline_code">FUTURE</td>
<td class="inline_code">2</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2sgf"</td>
<td><em>tu&nbsp;<span style="text-decoration: underline;">seras</span></em></td>
</tr>
<tr>
<td class="inline_code">FUTURE</td>
<td class="inline_code">3</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3sgf"</td>
<td><em>il&nbsp;<span style="text-decoration: underline;">sera</span></em></td>
</tr>
<tr>
<td class="inline_code">FUTURE</td>
<td class="inline_code">1</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1plf"</td>
<td><em>nous&nbsp;<span style="text-decoration: underline;">serons</span></em></td>
</tr>
<tr>
<td class="inline_code">FUTURE</td>
<td class="inline_code">2</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2plf"</td>
<td><em>vous&nbsp;<span style="text-decoration: underline;">serez</span></em></td>
</tr>
<tr>
<td class="inline_code">FUTURE</td>
<td class="inline_code">3</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3plf"</td>
<td><em>ils&nbsp;<span style="text-decoration: underline;">seron</span></em></td>
</tr>
</tbody>
</table>
<p>Instead of optional parameters, a single short alias, or&nbsp;<span class="inline_code">PARTICIPLE</span> or <span class="inline_code">PAST+PARTICIPLE</span> can also be given. With no parameters, the infinitive form of the verb is returned.</p>
<p class="small"><span style="text-decoration: underline;">Reference</span><span>: Salita, B. (2011).&nbsp;<em>French Verb Conjugation Rules</em>. Retrieved from:&nbsp;</span><span><a class="noexternal" style="color: inherit;" href="http://fvcr.sourceforge.net/" target="_blank">http://fvcr.sourceforge.net</a>.</span></p>
<h3>Attributive &amp; predicative adjectives&nbsp;</h3>
<p>French adjectives inflect with an <span class="inline_code">-e</span>,&nbsp;<span class="inline_code">-s</span>&nbsp; or&nbsp;<span class="inline_code">-es</span>&nbsp;suffix depending on gender. There are many irregular cases (e.g., <em>curieux</em>&nbsp;<em>une fille curieuse</em>). You can get the base form with the <span class="inline_code">predicative()</span> function.&nbsp;A statistical approach is used with an accuracy of 95%.</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.fr import predicative
&gt;&gt;&gt; print predicative('curieuse')
curieux </pre></div>
<h3>Sentiment analysis</h3>
<p class="example">For opinion mining there is <span class="inline_code">sentiment()</span>, which returns a (<span class="inline_code">polarity</span>, <span class="inline_code">subjectivity</span>)-tuple, based on a lexicon of adjectives. Polarity is a value between <span class="inline_code">-1.0</span> and <span class="inline_code">+1.0</span>, subjectivity between <span class="inline_code">0.0</span> and <span class="inline_code">1.0</span>. The accuracy is around 74% (P 0.77, R 0.73) for book reviews:</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.fr import sentiment
&gt;&gt;&gt; print sentiment('Un livre magnifique!')
(1.0, 1.0) </pre></div>
<h3>Parser</h3>
<p>For parsing there is <span class="inline_code">parse()</span>, <span class="inline_code">parsetree()</span> and&nbsp;<span class="inline_code">split()</span>. The <span class="inline_code">parse()</span> function annotates words in the given string with their part-of-speech <a class="link-maintenance" href="mbsp-tags.html">tags</a>&nbsp;(e.g., <span class="postag">NN</span> for nouns and <span class="postag">VB</span> for verbs). The <span class="inline_code">parsetree()</span> function takes a string and returns a tree of nested objects (<span class="inline_code">Text</span>&nbsp;<span class="inline_code">Sentence</span>&nbsp;<span class="inline_code">Chunk</span>&nbsp;<span class="inline_code">Word</span>). The <span class="inline_code">split()</span> function takes the output of <span class="inline_code">parse()</span> and returns a <span class="inline_code">Text</span>.&nbsp;See the <span class="inline_code">pattern.en</span> documentation&nbsp;(<span class="link-maintenance" style="color: #78aaff;"><a style="color: #8caaff; outline-style: none !important; outline-width: initial !important; outline-color: initial !important;" href="pattern-en.html#tree">here</a></span>) how to manipulate <span class="inline_code">Text</span>&nbsp;objects.&nbsp;</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.fr import parse, split
&gt;&gt;&gt;
&gt;&gt;&gt; s = parse(u"Le chat noir s'était assis sur le tapis.")
&gt;&gt;&gt; for sentence in split(s):
&gt;&gt;&gt; print sentence
Sentence('Le/DT/B-NP/O chat/NN/I-NP/O noir/JJ/I-NP/O'
"s'/PRP/B-NP/O était/VB/B-VP/O assis/VBN/I-VP/O"
'sur/IN/B-PP/B-PNP le/DT/B-NP/I-PNP tapis/NN/I-NP/I-PNP ././O/O')
</pre></div>
<p>The parser is based on <a href="http://alpage.inria.fr/~sagot/lefff-en.html">Le<em>fff</em></a>. For words in Le<em>fff</em> that can have multiple part-of-speech tags, we used <a href="http://www.lexique.org/">Lexique</a> to find the most frequent POS-tag.&nbsp;</p>
<p class="small"><span style="text-decoration: underline;">References</span>:&nbsp;</p>
<p class="small">Sagot, B. (2010).&nbsp;The Le<em>fff</em>, a freely available and large-coverage morphological and syntantic lexicon for French.&nbsp;<em>Proceedings of LREC'10</em>.</p>
<p class="small">New, B., Pallier, C., Ferrand, L. &amp; Matos, R. (2001). A lexical database for contemporary french: LEXIQUE. <em>L'année Psychologique</em>.&nbsp;</p>
</div>
</div></div>
</div>
</div>
</div>
</div>
</div>
</div>
<script>
SyntaxHighlighter.all();
</script>
</body>
</html>

@ -0,0 +1,431 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<title>pattern-graph</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link type="text/css" rel="stylesheet" href="../clips.css" />
<style>
/* Small fixes because we omit the online layout.css. */
h3 { line-height: 1.3em; }
#page { margin-left: auto; margin-right: auto; }
#header, #header-inner { height: 175px; }
#header { border-bottom: 1px solid #C6D4DD; }
table { border-collapse: collapse; }
#checksum { display: none; }
</style>
<link href="../js/shCore.css" rel="stylesheet" type="text/css" />
<link href="../js/shThemeDefault.css" rel="stylesheet" type="text/css" />
<script language="javascript" src="../js/shCore.js"></script>
<script language="javascript" src="../js/shBrushXml.js"></script>
<script language="javascript" src="../js/shBrushJScript.js"></script>
<script language="javascript" src="../js/shBrushPython.js"></script>
</head>
<body class="node-type-page one-sidebar sidebar-right section-pages">
<div id="page">
<div id="page-inner">
<div id="header"><div id="header-inner"></div></div>
<div id="content">
<div id="content-inner">
<div class="node node-type-page"
<div class="node-inner">
<div class="breadcrumb">View online at: <a href="http://www.clips.ua.ac.be/pages/pattern-graph" class="noexternal" target="_blank">http://www.clips.ua.ac.be/pages/pattern-graph</a></div>
<h1>pattern.graph</h1>
<!-- Parsed from the online documentation. -->
<div id="node-1392" class="node node-type-page"><div class="node-inner">
<div class="content">
<p class="big"><span style="font-size: 16px;">The pattern.graph module has tools for graph analysis (shortest path, centrality) and graph visualization in the browser. A graph is a network of nodes connected by edges. It can be used for example to study social networks or to model semantic relationships between concepts.</span></p>
<p>It can be used by itself or with other <a href="pattern.html">pattern</a> modules: <a href="pattern-web.html">web</a> | <a href="pattern-db.html">db</a> | <a href="pattern-en.html">en</a> | <a href="pattern-search.html">search</a> | <a href="pattern-vector.html">vector</a> | graph.</p>
<p><img style="border: 0px initial initial;" src="../g/pattern_schema.gif" alt="" width="620" height="180" /></p>
<hr />
<h2>Documentation</h2>
<ul>
<li><a href="#node">Node</a></li>
<li><a href="#edge">Edge</a></li>
<li><a href="#graph">Graph</a></li>
<li><a href="#layout">Graph layout</a></li>
<li><a href="#utility">Graph adjacency</a></li>
<li><a href="#canvas">Visualization</a>&nbsp;<span class="link-maintenance">(</span><a class="link-maintenance" href="#canvas"><span class="smallcaps link-maintenance">export</span></a><span class="link-maintenance">)</span></li>
<li><a href="#javascript">graph.js</a></li>
</ul>
<p>&nbsp;</p>
<hr />
<h2><a name="node"></a>Node</h2>
<p>A <span class="inline_code">Node</span> is an element with a unique id (a string or <span class="inline_code">int</span>) in a graph. A graph is a network of nodes and edges (connections between nodes). For example, the World Wide Web (WWW) can be represented as a vast graph with websites as nodes, website URLs as node id's, and hyperlinks as edges. Graph analysis can then be used to find important nodes (i.e., popular websites) and the shortest path between them.</p>
<p>A <span class="inline_code">Node</span> takes a number of optional parameters used to style the graph <a class="link-maintenance" href="#canvas">visualization</a> of the graph: <span class="inline_code">radius</span> (node size), <span class="inline_code">text</span>, <span class="inline_code">fill</span> and <span class="inline_code">stroke</span> (colors; each a tuple of <a href="http://en.wikipedia.org/wiki/RGBA">RGBA</a> values between <span class="inline_code">0.0</span>-<span class="inline_code">1.0</span>), <span class="inline_code">strokewidth</span>, <span class="inline_code">font</span>, <span class="inline_code">fontsize</span> and <span class="inline_code">fontweight</span>.</p>
<pre class="brush:python; gutter:false; light:true;">node = Node(id="", **kwargs)</pre><pre class="brush:python; gutter:false; light:true;">node.graph # Parent Graph.
node.id # Unique string or int.
node.links # List of Node objects.
node.edges # List of Edge objects.
node.edge(node, reverse=False)
</pre><pre class="brush:python; gutter:false; light:true;">node.weight # Eigenvector centrality (0.0-1.0).
node.centrality # Betweenness centrality (0.0-1.0).
node.degree # Degree centrality (0.0-1.0). </pre><pre class="brush:python; gutter:false; light:true;">node.x # 2D horizontal offset.
node.y # 2D vertical offset.
node.force # 2D Vector, updated by Graph.layout.
node.radius # Default: 5
node.fill # Default: None
node.stroke # Default: (0,0,0,1)
node.strokewidth # Default: 1
node.text # Text object, or None.</pre><pre class="brush:python; gutter:false; light:true;">node.flatten(depth=1, traversable=lambda node, edge: True)
</pre><ul>
<li><span class="inline_code">Node.edge(node)</span> returns the <span class="inline_code">Edge</span> from this node to the given <span class="inline_code">node</span>, or <span class="inline_code">None</span>.</li>
<li><span class="inline_code">Node.flatten()</span> returns a list with the node itself (<span class="inline_code">depth=0</span>), directly connected nodes (<span class="inline_code">depth=1</span>), nodes connected to those nodes (<span class="inline_code">depth=2</span>), and so on.</li>
</ul>
<p><span class="smallcaps">node weight and centrality</span></p>
<p>A well-known task in graph analysis is measuring how important or <em>central</em> each node in the graph is. The pattern.graph module has three centrality measurements, adopted from <a href="http://networkx.lanl.gov/">NetworkX</a>.</p>
<p><span class="inline_code">Node.weight</span> is the node's <em>eigenvector</em> centrality (= incoming traffic) as a value between <span class="inline_code">0.0</span>-<span class="inline_code">1.0</span>. Nodes with more (indirect) incoming edges have a higher weight. For example, in the WWW, popular websites are those that are often linked to, where the popularity of the referring websites is taken into account.</p>
<p><span class="inline_code">Node.centrality</span> is the node's <em>betweenness</em> centrality (= passing traffic) as a value between <span class="inline_code">0.0</span>-<span class="inline_code">1.0</span>. Nodes that occur more frequently in paths between other nodes have a higher betweenness. They are often found at the intersection of different clusters of nodes (e.g., like a broker or a bridge).</p>
<p><span class="inline_code">Node.degree</span> is the node's <em>degree</em> centrality (= local traffic) as a value between <span class="inline_code">0.0</span>-<span class="inline_code">1.0</span>. Nodes with more edges have a higher degree.</p>
<p>&nbsp;</p>
<hr />
<h2><a name="edge"></a>Edge</h2>
<p>An <span class="inline_code">Edge</span> is a connection between two nodes. Its <span class="inline_code">weight</span> defines the importance of the connection. Edges with a higher weight are preferred when traversing the path between two (indirectly) connected nodes.</p>
<p>An <span class="inline_code">Edge</span> takes optional parameters <span class="inline_code">stroke</span> (a tuple of <a href="http://en.wikipedia.org/wiki/RGBA">RGBA</a> values between <span class="inline_code">0.0</span>-<span class="inline_code">1.0</span>) and <span class="inline_code">strokewidth</span>, which can be used to style the graph&nbsp;<a class="link-maintenance" href="#canvas">visualization</a>.</p>
<pre class="brush:python; gutter:false; light:true;">edge = Edge(node1, node2, weight=0.0, length=1.0, type=None, **kwargs)</pre><pre class="brush:python; gutter:false; light:true;">edge.node1 # Node (sender).
edge.node2 # Node (receiver).
edge.weight # Connection strength.
edge.length # Length modifier for the visualization.
edge.type # Useful in semantic networks.
edge.stroke # Default: (0,0,0,1)
edge.strokewidth # Default: 1 </pre><p class="smallcaps"><br />directed graph</p>
<p>An edge can be traversed in both directions: from <span class="inline_code">node1</span><span class="inline_code">node2</span>, and from <span class="inline_code">node2</span><span class="inline_code">node1</span>. The <span class="inline_code">Graph.shortest_path()</span> and <span class="inline_code">Graph.betweenness_centrality()</span> methods have a <span class="inline_code">directed</span> parameter which can be set to <span class="inline_code">True</span>, so that edges are only traversed from <span class="inline_code">node1</span><span class="inline_code">node2</span>. This is called a directed graph. Evidently, it produces different shortest paths and node weights.</p>
<p>Two nodes can be connected by at most two edges (one in each direction). Otherwise, <span class="inline_code">Graph.add_edge()</span> simply returns the edge that already exists between the given nodes.</p>
<p>&nbsp;</p>
<hr />
<h2><a name="graph"></a>Graph</h2>
<p>A <span class="inline_code">Graph</span> is a network of nodes connected by edges, with methods for finding paths between (indirectly) connected nodes.</p>
<pre class="brush:python; gutter:false; light:true;">graph = Graph(layout=SPRING, distance=10.0)</pre><pre class="brush:python; gutter:false; light:true;">graph[id] # Node with given id (Graph is a subclass of dict).
graph.nodes # List of Node objects.
graph.edges # List of Edge objects.
graph.density # &lt; 0.35 =&gt; sparse, &gt; 0.65 =&gt; dense
graph.layout # GraphSpringLayout.
graph.distance # GraphSpringLayout spacing.
</pre><pre class="brush:python; gutter:false; light:true;">graph.add_node(id) # Creates + returns new Node.
graph.add_edge(id1, id2) # Creates + returns new Edge.
graph.remove(node) # Removes given Node + edges.
graph.remove(edge) # Removes given Edge.
graph.prune(depth=0) # Removes nodes + edges if len(node.links) &lt;= depth.
graph.node(id) # Returns node with given id.
graph.edge(id1, id2) # Returns edge connecting the given nodes.
graph.copy(nodes=ALL) # Returns a new Graph.
graph.split() # Returns a list of (unconnected) graphs.
</pre><pre class="brush:python; gutter:false; light:true;">graph.eigenvector_centrality() # Updates all Node.weight values.
graph.betweenness_centrality() # Updates all Node.centrality values. </pre><pre class="brush:python; gutter:false; light:true;">graph.shortest_path(node1, node2, heuristic=None, directed=False)
graph.shortest_paths(node, heuristic=None, directed=False)
graph.paths(node1, node2, length=4)
graph.fringe(depth=0, traversable=lambda node, edge: True)
</pre><pre class="brush:python; gutter:false; light:true;">graph.update(iterations=10, weight=10, limit=0.5)</pre><ul>
<li><span class="inline_code"><span><span class="inline_code">Graph.add_node()</span></span></span> takes an id + any optional parameter of <span><span class="inline_code">Node</span></span>.</li>
<li><span class="inline_code">Graph.add_edge()</span> takes two id's + any optional parameter of <span class="inline_code">Edge</span>.<br />Both methods have an optional <span class="inline_code">base</span> parameter that defines the subclass of <span class="inline_code">Node</span> or <span class="inline_code">Edge</span> to use.</li>
</ul>
<ul>
<li><span class="inline_code">Graph.prune()</span> removes all nodes with less or equal (undirected) connections than <span class="inline_code">depth</span>.</li>
<li><span class="inline_code">Graph.copy()</span> returns a new <span class="inline_code">Graph</span> from the given list of nodes.</li>
<li><span class="inline_code">Graph.split()</span> return a list of unconnected subgraphs.</li>
</ul>
<ul>
<li><span class="inline_code"><span><span class="inline_code">Graph.paths()</span></span></span> returns all paths (each a list of nodes) &lt;= <span class="inline_code">length</span> connecting two given nodes.</li>
<li><span class="inline_code"><span><span class="inline_code">Graph.shortest_path()</span></span></span> returns a list of nodes connecting the two given nodes<span class="inline_code"><span>.</span><br /></span></li>
<li><span class="inline_code">Graph.shortest_paths()</span> returns a dictionary of node <span style="line-height: normal;"></span> shortest path.<br />The optional <span class="inline_code">heuristic</span> function takes two node id's and returns a penalty (<span class="inline_code">0.0</span>-<span class="inline_code">1.0</span>) for traversing their edges. With <span class="inline_code">directed=True</span>, edges are only traversable in one direction.</li>
</ul>
<ul>
<li><span class="inline_code">Graph.fringe()</span> returns a list of <em>leaf</em> nodes.<br />With <span class="inline_code">depth=0</span>, returns the nodes with one edge.<br />With <span class="inline_code">depth=1</span>, returns the nodes with one edge + the connected nodes, etc.</li>
</ul>
<p>For example:</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.graph import Graph
&gt;&gt;&gt;
&gt;&gt;&gt; g = Graph()
&gt;&gt;&gt; for n1, n2 in (
&gt;&gt;&gt; ('cat', 'tail'), ('cat', 'purr'), ('purr', 'sound'),
&gt;&gt;&gt; ('dog', 'tail'), ('dog', 'bark'), ('bark', 'sound')):
&gt;&gt;&gt; g.add_node(n1)
&gt;&gt;&gt; g.add_node(n2)
&gt;&gt;&gt; g.add_edge(n1, n2, weight=0.0, type='is-related-to')
&gt;&gt;&gt;
&gt;&gt;&gt; for n in sorted(g.nodes, key=lambda n: n.weight):
&gt;&gt;&gt; print '%.2f' % n.weight, n
0.00 Node(id='cat')
0.00 Node(id='dog')
0.07 Node(id='purr')
0.07 Node(id='bark')
0.15 Node(id='tail')
1.00 Node(id='sound')
</pre></div>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; for n in g.shortest_path('purr', 'bark'):
&gt;&gt;&gt; print n
Node(id='purr')
Node(id='sound')
Node(id='bark')
</pre></div>
<table border="0">
<tbody>
<tr>
<td>
<p>When sorted by <span class="inline_code">Node.weight</span> (i.e., eigenvector centrality), <em>sound</em> is the most important node in the network. This can be explained by observing the visualization on the right. Most nodes (indirectly) connect to <em>sound</em> or <em>tail</em>. No nodes connect to <em>dog</em> or <em>cat</em>, so these are the least important in the network (weight <span class="inline_code">0.0</span>).</p>
<p>By default, nodes with a higher height will have a larger radius in the visualization.</p>
</td>
<td><img src="../g/pattern_graph3.jpg" alt="" width="170" height="155" /></td>
</tr>
</tbody>
</table>
<p>&nbsp;</p>
<hr />
<h2><a name="layout"></a>Graph layout</h2>
<p>A <span class="inline_code">GraphLayout</span> updates node positions (<span class="inline_code">Node.x</span>, <span class="inline_code">Node.y</span>) iteratively each time <span class="inline_code">GraphLayout.update()</span> is called. The pattern.graph module currently has one implementation: <span class="inline_code">GraphSpringLayout</span>, which uses a force-based algorithm where edges are regarded as springs. Connected nodes are pulled closer together (attraction) while other nodes are pushed further apart (repulsion).</p>
<pre class="brush:python; gutter:false; light:true;">layout = GraphSpringLayout(graph)</pre><pre class="brush:python; gutter:false; light:true;">layout.graph # Graph owner.
layout.iterations # Starts at 0, +1 each update().
layout.bounds # (x, y, width, height)-tuple.</pre><pre class="brush:python; gutter:false; light:true;">layout.k # Force constant (4.0)
layout.force # Force multiplier (0.01)
layout.repulsion # Maximum repulsion radius (50)</pre><pre class="brush:python; gutter:false; light:true;">layout.update(weight=10.0, limit=0.5) # weight = Edge.weight multiplier.
layout.reset()
layout.copy(graph)</pre><p><span class="small"><span style="text-decoration: underline;">Reference</span>: Hellesoy, A. &amp; Hoover, D. (2006). http://ajaxian.com/archives/new-javascriptcanvas-graph-library</span></p>
<p>&nbsp;</p>
<hr />
<h2><a name="utility"></a>Graph adjacency</h2>
<p>The pattern.graph has a number of functions that can be used to modify graph edges:</p>
<pre class="brush:python; gutter:false; light:true;">unlink(graph, node1, node2=None)</pre><pre class="brush:python; gutter:false; light:true;">redirect(graph, node1, node2)</pre><pre class="brush:python; gutter:false; light:true;">cut(graph, node)</pre><pre class="brush:python; gutter:false; light:true;">insert(graph, node, a, b)</pre><ul>
<li style="margin-bottom: 0.3em;"><span class="inline_code">unlink()</span> removes the edge between <span class="inline_code">node1</span> and <span class="inline_code">node2</span>. <br />If only <span class="inline_code">node1</span> is given, removes all edges to + from it. This does not remove <span class="inline_code">node1</span> from the graph.</li>
<li style="margin-bottom: 0.3em;"><span class="inline_code">redirect()</span> connects <span class="inline_code">node1</span>'s edges to <span class="inline_code">node2</span> and removes&nbsp;<span class="inline_code">node1</span>.<br />If <span class="inline_code">A</span>, <span class="inline_code">B</span>, <span class="inline_code">C</span>, <span class="inline_code">D</span> are nodes and <span class="inline_code">A</span><span class="inline_code">B</span> and <span class="inline_code">C</span><span class="inline_code">D</span>, and we redirect <span class="inline_code">A</span> to <span class="inline_code">C</span>, then <span class="inline_code">C</span><span class="inline_code">B</span> and <span class="inline_code">C</span><span class="inline_code">D</span>.</li>
<li style="margin-bottom: 0.3em;"><span class="inline_code">cut()</span> removes the given <span class="inline_code">node</span>&nbsp;and connects the surrounding nodes. <br />If <span class="inline_code">A</span>, <span class="inline_code">B</span>, <span class="inline_code">C</span>, <span class="inline_code">D</span> are nodes and <span class="inline_code">A</span> <span></span> <span class="inline_code">B</span> and <span class="inline_code">B</span> <span></span> <span class="inline_code">C</span> and <span class="inline_code">B</span> <span></span> <span class="inline_code">D</span>, and we cut <span class="inline_code">B</span>, then <span class="inline_code">A</span> <span></span> <span class="inline_code">C</span> and <span class="inline_code">A</span> <span></span> <span class="inline_code">D</span>.</li>
<li><span class="inline_code">insert()</span> inserts the given <span class="inline_code">node</span> between node <span class="inline_code">a</span> and node <span class="inline_code">b</span>. <br />If <span class="inline_code">A</span>, <span class="inline_code">B</span>, <span class="inline_code">C</span> are nodes and <span class="inline_code">A</span> <span></span> <span class="inline_code">B</span>, and we insert <span class="inline_code">C</span>, then <span class="inline_code">A</span> <span></span> <span class="inline_code">C</span> and <span class="inline_code">C</span> <span></span> <span class="inline_code">B</span>.</li>
</ul>
<h3>Edge adjacency map</h3>
<p><span style="font-variant: normal;">The <span class="inline_code">adjacency()</span> function returns a map of linked nodes:</span><span class="smallcaps"><br /></span></p>
<pre class="brush:python; gutter:false; light:true;">adjacency(graph,
directed = False,
reversed = False,
stochastic = False,
heuristic = lambda node1, node2: 0)</pre><p>The return value is an&nbsp;<span class="inline_code">{id1:</span> <span class="inline_code">{id2:</span> <span class="inline_code">weight}}</span>&nbsp;dictionary with <span class="inline_code">Node.id</span>'s as keys, where each value is a dictionary of connected&nbsp;<span class="inline_code">Node.id</span>'s&nbsp;<span style="line-height: 18px;"></span>&nbsp;<span class="inline_code">Edge.weight</span>.</p>
<p>If <span class="inline_code">directed=True</span>, edges are only traversable in one direction. If <span class="inline_code">stochastic=True</span>, the edge weights for all neighbors of a given node sum to <span class="inline_code">1.0</span>.&nbsp;The optional <span class="inline_code">heuristic</span> function takes two node id's and returns an additional cost (<span class="inline_code">0.0</span>-<span class="inline_code">1.0</span>) for traversing their edges.&nbsp;</p>
<h3>Edge traversal</h3>
<p>The <span class="inline_code">bfs()</span> function (breadth-first search) visits all nodes connected to the given <span class="inline_code">node</span>. <br />The <span class="inline_code">dfs()</span> function (depth-first search) visits all nodes connected to the given <span class="inline_code">node</span> depth-first, i.e., as far as possible along each path before backtracking.</p>
<pre class="brush:python; gutter:false; light:true;">bfs(node, visit=lambda node: False, traversable=lambda node, edge: True)</pre><pre class="brush:python; gutter:false; light:true;">dfs(node, visit=lambda node: False, traversable=lambda node, edge: True)
</pre><p>The given&nbsp;<span class="inline_code">visit</span>&nbsp;function is called with each visited node. Traversal will stop if it returns <span class="inline_code">True</span>, and subsequently <span class="inline_code">bfs()</span> or <span class="inline_code">dfs()</span> will return <span class="inline_code">True</span>.</p>
<p>The given&nbsp;<span class="inline_code">traversable</span> function takes the visited&nbsp;<span class="inline_code">Node</span> and an&nbsp;<span class="inline_code">Edge</span> and returns <span class="inline_code">True</span> if we are allowed to follow this connection to the next node. For example, the traversable for directed edges:</p>
<div class="example">
<pre class="brush: python;gutter: false; light: true; fontsize: 100; first-line: 1; ">&gt;&gt;&gt; def directed(node, edge):
&gt;&gt;&gt; return node.id == edge.node1.id
&gt;&gt;&gt;
&gt;&gt;&gt; dfs(g, traversable=directed) </pre></div>
<p>&nbsp;</p>
<hr />
<h2><a name="canvas"></a>Visualization</h2>
<p>The pattern.graph module has a JavaScript counterpart (graph.js) that can be used to visualize a graph in a web page, as a&nbsp;HTML&nbsp;&lt;canvas&gt; element. The HTML &lt;canvas&gt; element allows dynamic, scriptable rendering of 2D shapes and bitmap images (see also Pattern's&nbsp;<a class="link-maintenance" href="pattern-canvas.html">canvas.js</a>).</p>
<p><span class="inline_code">Graph.export(</span>) creates a new file folder at the given <span class="inline_code">path</span>&nbsp;with an index.html (the visualization), a style.css, graphs.js and canvas.js. The optional parameter <span class="inline_code">javascript</span>&nbsp;defines the URL path to graph.js and canvas.js (which will not be included in this case).</p>
<pre class="brush:python; gutter:false; light:true;">graph.export(path, encoding='utf-8', **kwargs)</pre><div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.graph import Graph
&gt;&gt;&gt;
&gt;&gt;&gt; g = Graph()
&gt;&gt;&gt; for n1, n2 in (
&gt;&gt;&gt; ('cat', 'tail'), ('cat', 'purr'), ('purr', 'sound'),
&gt;&gt;&gt; ('dog', 'tail'), ('dog', 'bark'), ('bark', 'sound')):
&gt;&gt;&gt; g.add_node(n1)
&gt;&gt;&gt; g.add_node(n2)
&gt;&gt;&gt; g.add_edge(n1, n2, weight=0.0, type='is-related-to')
&gt;&gt;&gt;
&gt;&gt;&gt; g.export('sound', directed=True)</pre></div>
<p>Nodes and edges will be styled according to their <span class="inline_code">fill</span>, <span class="inline_code">stroke</span>, and <span class="inline_code">strokewidth</span>&nbsp;properties.</p>
<p>The following parameters can be used to customize the visualization:</p>
<table class="border">
<tbody>
<tr>
<td><span class="smallcaps">Parameter</span></td>
<td><span class="smallcaps">Default</span></td>
<td><span class="smallcaps">Description</span></td>
</tr>
<tr>
<td><span class="inline_code">javascript</span></td>
<td><span class="inline_code">''</span></td>
<td>Path to canvas.js&nbsp;and graph.js.</td>
</tr>
<tr>
<td><span class="inline_code">stylesheet</span></td>
<td class="inline_code"><span class="inline_code">INLINE</span></td>
<td>Path to CSS: INLINE,&nbsp;<span class="inline_code">DEFAULT</span>&nbsp;(generates style.css),&nbsp;<span class="inline_code">None</span>&nbsp;or path.</td>
</tr>
<tr>
<td><span class="inline_code">title</span></td>
<td><span class="inline_code">'Graph'</span></td>
<td>HTML&nbsp;<span class="inline_code"><span><span class="inline_code">&lt;title&gt;Graph&lt;/title&gt;</span>.</span></span></td>
</tr>
<tr>
<td><span class="inline_code">id</span></td>
<td><span class="inline_code">'graph'</span></td>
<td>HTML&nbsp;<span class="inline_code">&lt;div</span> <span class="inline_code">id="graph"&gt;</span>&nbsp;contains the&nbsp;<span class="inline_code">&lt;canvas&gt;</span>.</td>
</tr>
<tr>
<td style="border: 0; font-size: 0.5em;">&nbsp;</td>
</tr>
<tr>
<td><span class="inline_code">ctx</span></td>
<td><span class="inline_code">'canvas.element'</span></td>
<td>HTML <span class="inline_code">&lt;canvas&gt;</span> element to use for drawing.</td>
</tr>
<tr>
<td><span class="inline_code">width</span></td>
<td><span class="inline_code">700</span></td>
<td>Canvas width in pixels.</td>
</tr>
<tr>
<td><span class="inline_code">height</span></td>
<td><span class="inline_code">500</span></td>
<td>Canvas height in pixels.</td>
</tr>
<tr>
<td><span class="inline_code">frames</span></td>
<td><span class="inline_code">500</span></td>
<td>Number of frames of animation.</td>
</tr>
<tr>
<td><span class="inline_code">ipf</span></td>
<td><span class="inline_code">2</span></td>
<td><span class="inline_code">GraphLayout.update()</span> iterations per frame.</td>
</tr>
<tr>
<td style="border: 0; font-size: 0.5em;">&nbsp;</td>
</tr>
<tr>
<td><span class="inline_code">directed</span></td>
<td><span class="inline_code">False</span></td>
<td>Visualize eigenvector centrality as an edge arrow?</td>
</tr>
<tr>
<td><span class="inline_code">weighted</span></td>
<td><span class="inline_code">False</span></td>
<td>Visualize betweenness centrality as a node shadow?</td>
</tr>
<tr>
<td><span class="inline_code">pack</span></td>
<td><span class="inline_code">True</span></td>
<td>Shorten leaf edges + add node weight to node radius.</td>
</tr>
<tr>
<td style="border: 0; font-size: 0.5em;">&nbsp;</td>
</tr>
<tr>
<td><span class="inline_code">distance</span></td>
<td><span class="inline_code">graph.distance</span></td>
<td>Average edge length.</td>
</tr>
<tr>
<td><span class="inline_code">k</span></td>
<td><span class="inline_code">graph.k</span></td>
<td>Force constant.</td>
</tr>
<tr>
<td><span class="inline_code">force</span></td>
<td><span class="inline_code">graph.force</span></td>
<td>Force dampener.</td>
</tr>
<tr>
<td><span class="inline_code">repulsion</span></td>
<td><span class="inline_code">graph.repulsion</span></td>
<td>Force radius.</td>
</tr>
<tr>
<td style="border: 0; font-size: 0.5em;">&nbsp;</td>
</tr>
<tr>
<td><span class="inline_code">href</span></td>
<td><span class="inline_code">{}</span></td>
<td>Dictionary of <span class="inline_code">Node.id</span> =&gt; URL.</td>
</tr>
<tr>
<td><span class="inline_code">css</span></td>
<td><span class="inline_code">{}</span></td>
<td>Dictionary of <span class="inline_code">Node.id</span> =&gt; CSS classname.</td>
</tr>
</tbody>
</table>
<p>To export a static visualization, use <span class="inline_code">frames=1</span> and <span class="inline_code">ipf=0</span>.<br />&nbsp;</p>
<p class="smallcaps">Server-side scripting</p>
<p><span class="inline_code">Graph.serialize()</span> returns a string with (a portion of) the HTML, CSS and JavaScript source code of the visualization. It can be used to serve a dynamic web page.&nbsp;With <span class="inline_code">type=CANVAS</span>, it returns a HTML string with a <span class="inline_code">&lt;div</span> <span class="inline_code">id="graph"&gt;</span>&nbsp;that contains the canvas.js animation.&nbsp;With <span class="inline_code">type=DATA</span>, it returns a Javascript string that initializes the <span class="inline_code">Graph</span> in variable&nbsp;<span class="inline_code">g</span>&nbsp;(which will draw to <span class="inline_code">ctx</span>).</p>
<pre class="brush: python;gutter: false; light: true; fontsize: 100; first-line: 1; ">graph.serialize(type=HTML, **kwargs) # HTML | CSS | CANVAS | DATA</pre><div class="example">
<pre class="brush: python;gutter: false; light: true; fontsize: 100; first-line: 1; ">&gt;&gt;&gt; import cherrypy
&gt;&gt;&gt;
&gt;&gt;&gt; class Visualization(object):
&gt;&gt;&gt; def index(self):
&gt;&gt;&gt; return (
&gt;&gt;&gt; '&lt;html&gt;'
&gt;&gt;&gt; '&lt;head&gt;'
&gt;&gt;&gt; '&lt;script src="canvas.js"&gt;&lt;/script&gt;'
&gt;&gt;&gt; '&lt;script src="graph.js"&gt;&lt;/script&gt;'
&gt;&gt;&gt; '&lt;/head&gt;'
&gt;&gt;&gt; '&lt;body&gt;' + g.serialize(CANVAS, directed=True) +
&gt;&gt;&gt; '&lt;/body&gt;'
&gt;&gt;&gt; '&lt;/html&gt;'
&gt;&gt;&gt; )
&gt;&gt;&gt; index.exposed = True
&gt;&gt;&gt;
&gt;&gt;&gt; cherrypy.quickstart(Visualization())</pre></div>
<p>&nbsp;</p>
<hr />
<h2><a name="javascript"></a>graph.js</h2>
<p>Below is a standalone demonstration of graph.js, without using&nbsp;<span class="inline_code">export()</span> or canvas.js. The <span class="inline_code">Graph.loop()</span> method fires the spring layout algorithm&nbsp;(<span class="link-maintenance"><a href="http://www.clips.ua.ac.be/media/pattern-graph/random" target="_blank">view live demo</a></span>).</p>
<p><img class="border" src="../g/pattern_graph4.jpg" alt="" width="610" height="390" /></p>
<div class="example">
<pre class="brush:xml; gutter:false; light:true;">&lt;!doctype html&gt;
&lt;html&gt;
&lt;head&gt;
&lt;meta charset="utf-8"&gt;
&lt;style&gt;
#graph { display: block; position: relative; overflow: hidden; }
#graph .node-label { font: 11px sans-serif; }
&lt;/style&gt;
&lt;script src="graph.js"&gt;&lt;/script&gt;
&lt;script&gt;
</pre></div>
<div class="example">
<pre class="brush: jscript;gutter: false; light: true; fontsize: 100; first-line: 1; ">&nbsp;&nbsp;&nbsp;&nbsp;function spring() {
SHADOW = 0.65 // slow...
g = new Graph(document.getElementById("_ctx"));
// Random nodes.
for (var i=0; i &lt; 50; i++) {
g.addNode(i+1);
}
// Random edges.
for (var j=0; j &lt; 75; j++) {
var n1 = choice(g.nodes);
var n2 = choice(g.nodes);
g.addEdge(n1, n2, {weight: Math.random()});
}
g.prune(0);
g.betweennessCentrality();
g.eigenvectorCentrality();
g.loop({frames:500, fps:30, ipf:2, weighted:0.5, directed:true});
}
</pre></div>
<div class="example">
<pre class="brush:xml; gutter:false; light:true;"> &lt;/script&gt;
&lt;/head&gt;
&lt;body onload="spring();"&gt;
&lt;div id="graph" style="width:700px; height:500px;"&gt;
&lt;canvas id="_ctx" width="700" height="500"&gt;&lt;/canvas&gt;
&lt;/div&gt;
&lt;/body&gt;
&lt;/html&gt; </pre></div>
<p>&nbsp;</p>
<hr />
<h2>See also</h2>
<ul>
<li><a href="http://gephi.org/" target="_blank">Gephi</a> (GPL): ne<span>twork analysis &amp; visualization GUI.</span></li>
<li><a href="http://networkx.lanl.gov/" target="_blank">NetworkX</a> (BSD): <span>network analysis toolkit for Python + NumPy.</span></li>
<li><a href="http://www.cityinabottle.org/nodebox/" target="_blank">NodeBox</a> (BSD): g<span>raphics toolkit for Python + OpenGL.</span></li>
</ul>
</div>
</div></div>
</div>
</div>
</div>
</div>
</div>
</div>
<script>
SyntaxHighlighter.all();
</script>
</body>
</html>

@ -0,0 +1,613 @@
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<head>
<title>pattern-it</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link type="text/css" rel="stylesheet" href="../clips.css" />
<style>
/* Small fixes because we omit the online layout.css. */
h3 { line-height: 1.3em; }
#page { margin-left: auto; margin-right: auto; }
#header, #header-inner { height: 175px; }
#header { border-bottom: 1px solid #C6D4DD; }
table { border-collapse: collapse; }
#checksum { display: none; }
</style>
<link href="../js/shCore.css" rel="stylesheet" type="text/css" />
<link href="../js/shThemeDefault.css" rel="stylesheet" type="text/css" />
<script language="javascript" src="../js/shCore.js"></script>
<script language="javascript" src="../js/shBrushXml.js"></script>
<script language="javascript" src="../js/shBrushJScript.js"></script>
<script language="javascript" src="../js/shBrushPython.js"></script>
</head>
<body class="node-type-page one-sidebar sidebar-right section-pages">
<div id="page">
<div id="page-inner">
<div id="header"><div id="header-inner"></div></div>
<div id="content">
<div id="content-inner">
<div class="node node-type-page"
<div class="node-inner">
<div class="breadcrumb">View online at: <a href="http://www.clips.ua.ac.be/pages/pattern-it" class="noexternal" target="_blank">http://www.clips.ua.ac.be/pages/pattern-it</a></div>
<h1>pattern.it</h1>
<!-- Parsed from the online documentation. -->
<div id="node-1698" class="node node-type-page"><div class="node-inner">
<div class="content">
<p><span class="big">The pattern.it module contains a fast part-of-speech tagger for Italian (identifies nouns, adjectives, verbs, etc. in a sentence) and tools for Italian verb conjugation and noun singularization &amp; pluralization.</span></p>
<p>It can be used by itself or with other&nbsp;<a href="pattern.html">pattern</a>&nbsp;modules:&nbsp;<a href="pattern-web.html">web</a>&nbsp;|&nbsp;<a href="pattern-db.html">db</a>&nbsp;| <a href="pattern-en.html">en</a>&nbsp;|&nbsp;<a href="pattern-search.html">search</a>&nbsp;|&nbsp;<a href="pattern-vector.html">vector</a>&nbsp;|&nbsp;<a href="pattern-graph.html">graph</a>.</p>
<p><img src="../g/pattern_schema_it.gif" alt="" /></p>
<hr />
<h2>Documentation</h2>
<p>The functions in this module take the same parameters and return the same values as their counterparts in <a href="pattern-en.html">pattern.en</a>. Refer to the documentation there for more details.&nbsp;&nbsp;</p>
<h3>Gender</h3>
<p>Italian nouns and adjectives inflect according to gender. The <span class="inline_code">gender()</span> function predicts the gender (<span class="inline_code">MALE</span>, <span class="inline_code">FEMALE</span>,&nbsp;<span class="inline_code">PLURAL</span>) of&nbsp;a given noun with about 92% accuracy:&nbsp;</p>
<div class="example">
<pre class="brush: python;gutter: false; light: true; fontsize: 100; first-line: 1; ">&gt;&gt;&gt; from pattern.it import gender, MALE, FEMALE, PLURAL
&gt;&gt;&gt; print gender('gatti')
(MALE, PLURAL)</pre></div>
<h3>Article</h3>
<p>The <span class="inline_code">article()</span> function returns the article (<span class="inline_code">INDEFINITE</span> or <span class="inline_code">DEFINITE</span>) inflected by gender (e.g., <em><span style="text-decoration: underline;">il</span> gatto</em>&nbsp;<em><span style="text-decoration: underline;">i</span> gatti</em>).</p>
<div class="example">
<pre class="brush: python;gutter: false; light: true; fontsize: 100; first-line: 1; ">&gt;&gt;&gt; from pattern.it import article, DEFINITE, MALE, PLURAL
&gt;&gt;&gt; print article('gatti', DEFINITE, gender=(MALE, PLURAL))
i</pre></div>
<h3>Noun singularization &amp; pluralization</h3>
<p>For Italian nouns there is <span class="inline_code">singularize()</span> and <span class="inline_code">pluralize()</span>.&nbsp;The implementation is slightly less robust than the English version (accuracy 84% for singularization and 93% for pluralization).</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.it import singularize, pluralize
&gt;&gt;&gt;
&gt;&gt;&gt; print singularize('gatti')
&gt;&gt;&gt; print pluralize('gatto')
gatto
gatti </pre></div>
<h3>Verb conjugation</h3>
<p>For Italian verbs there is <span class="inline_code">conjugate()</span>, <span class="inline_code">lemma()</span>, <span class="inline_code">lexeme()</span> and <span class="inline_code">tenses()</span>.&nbsp;The lexicon for verb conjugation contains about 1,250 common Italian verbs, mined from Wiktionary. For unknown verbs it will fall back to a rule-based approach with an accuracy of about 86%.&nbsp;</p>
<p>Italian verbs have more tenses than English verbs. In particular, the plural differs for each person, and there are additional forms for the&nbsp;<span class="inline_code">FUTURE</span>&nbsp;tense, the&nbsp;<span class="inline_code">IMPERATIVE</span>, <span class="inline_code">CONDITIONAL</span> and&nbsp;<span class="inline_code">SUBJUNCTIVE</span>&nbsp;mood and the&nbsp;<span class="inline_code">PERFECTIVE</span>&nbsp;aspect:</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.it import conjugate
&gt;&gt;&gt; from pattern.it import INFINITIVE, PRESENT, PAST, SG, SUBJUNCTIVE, PERFECTIVE
&gt;&gt;&gt;
&gt;&gt;&gt; print conjugate('sono', INFINITIVE)
&gt;&gt;&gt; print conjugate('sono', PRESENT, 1, SG, mood=SUBJUNCTIVE)
&gt;&gt;&gt; print conjugate('sono', PAST, 3, SG)
&gt;&gt;&gt; print conjugate('sono', PAST, 3, SG, aspect=PERFECTIVE)
essere
sia
era
fu </pre></div>
<p>For <span class="inline_code">PAST</span>&nbsp;tense + <span class="inline_code">PERFECTIVE</span>&nbsp;aspect we can also use <span class="inline_code">PRETERITE</span>&nbsp;(<em>passato remoto</em>) For <span class="inline_code">PAST</span>&nbsp;tense + <span class="inline_code">IMPERFECTIVE</span>&nbsp;aspect we can also use <span class="inline_code">IMPERFECT</span>&nbsp;(<em>imperfetto</em>).</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.it import conjugate
&gt;&gt;&gt; from pattern.it import IMPERFECT, PRETERITE
&gt;&gt;&gt;
&gt;&gt;&gt; print conjugate('sono', IMPERFECT, 3, SG)
&gt;&gt;&gt; print conjugate('sono', PRETERITE, 3, SG)
era
fu </pre></div>
<p>&nbsp;The <span class="inline_code">conjugate()</span> function takes the following optional parameters:</p>
<table class="border">
<tbody>
<tr>
<td class="smallcaps">Tense</td>
<td class="smallcaps">Person</td>
<td class="smallcaps">Number</td>
<td class="smallcaps">Mood</td>
<td class="smallcaps">Aspect</td>
<td class="smallcaps">Alias</td>
<td class="smallcaps">Example</td>
</tr>
<tr>
<td class="inline_code">INFINITVE</td>
<td class="inline_code">None</td>
<td class="inline_code">None</td>
<td class="inline_code">None</td>
<td class="inline_code">None</td>
<td class="inline_code">"inf"</td>
<td><em>essere</em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">1</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1sg"</td>
<td><em>io&nbsp;<span style="text-decoration: underline;">sono</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">2</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2sg"</td>
<td><em>tu&nbsp;<span style="text-decoration: underline;">sei</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">3</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3sg"</td>
<td><em>lui&nbsp;<span style="text-decoration: underline;">è</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">1</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1pl"</td>
<td><em>noi&nbsp;<span style="text-decoration: underline;">siamo</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">2</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2pl"</td>
<td><em>voi&nbsp;<span style="text-decoration: underline;">siete</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">3</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3pl"</td>
<td><em>loro&nbsp;<span style="text-decoration: underline;">sono</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">None</td>
<td class="inline_code">None</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">PROGRESSIVE</td>
<td class="inline_code">"part"</td>
<td><em>essendo</em></td>
</tr>
<tr>
<td style="border-left: 0; border-right: 0; padding: 0;">&nbsp;</td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">2</td>
<td class="inline_code">SG</td>
<td class="inline_code">IMPERATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2sg!"</td>
<td><em>sii</em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">3</td>
<td class="inline_code">SG</td>
<td class="inline_code">IMPERATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3sg!"</td>
<td><em>sia</em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">1</td>
<td class="inline_code">PL</td>
<td class="inline_code">IMPERATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1pl!"</td>
<td><em>siamo</em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">2</td>
<td class="inline_code">PL</td>
<td class="inline_code">IMPERATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2pl!"</td>
<td><em>siate</em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">3</td>
<td class="inline_code">PL</td>
<td class="inline_code">IMPERATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3pl!"</td>
<td><em>siano</em></td>
</tr>
<tr>
<td style="border-left: 0; border-right: 0; padding: 0;">&nbsp;</td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">1</td>
<td class="inline_code">SG</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1sg?"</td>
<td><em>io&nbsp;<span style="text-decoration: underline;">sia</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">2</td>
<td class="inline_code">SG</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2sg?"</td>
<td><em>tu&nbsp;<span style="text-decoration: underline;">sia</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">3</td>
<td class="inline_code">SG</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3sg?"</td>
<td><em>lui&nbsp;<span style="text-decoration: underline;">sia</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">1</td>
<td class="inline_code">PL</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1pl?"</td>
<td><em>noi&nbsp;<span style="text-decoration: underline;">siamo</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">2</td>
<td class="inline_code">PL</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2pl?"</td>
<td><em>voi&nbsp;<span style="text-decoration: underline;">siate</span></em></td>
</tr>
<tr>
<td class="inline_code">PRESENT</td>
<td class="inline_code">3</td>
<td class="inline_code">PL</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3pl?"</td>
<td><em>loro&nbsp;<span style="text-decoration: underline;">siano</span></em></td>
</tr>
<tr>
<td style="border-left: 0; border-right: 0; padding: 0;">&nbsp;</td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">1</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1sgp"</td>
<td><em>io&nbsp;<span style="text-decoration: underline;">ero</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">2</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2sgp"</td>
<td><em>tu&nbsp;<span style="text-decoration: underline;">eri</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">3</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3sgp"</td>
<td><em>lui&nbsp;<span style="text-decoration: underline;">era</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">1</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1ppl"</td>
<td><em>noi <span style="text-decoration: underline;">e</span><span style="text-decoration: underline;">ravamo</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">2</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2ppl"</td>
<td><em>voi&nbsp;<span style="text-decoration: underline;">eravate</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">3</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3ppl"</td>
<td><em>loro&nbsp;<span style="text-decoration: underline;">erano</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">None</td>
<td class="inline_code">None</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">PROGRESSIVE</td>
<td class="inline_code">"ppart"</td>
<td><em>stato</em></td>
</tr>
<tr>
<td style="border-left: 0; border-right: 0; padding: 0;">&nbsp;</td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">1</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">PERFECTIVE</td>
<td class="inline_code">"1sgp+"</td>
<td><em>io&nbsp;<span style="text-decoration: underline;">fui</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">2</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">PERFECTIVE</td>
<td class="inline_code">"2sgp+"</td>
<td><em>tu&nbsp;<span style="text-decoration: underline;">fosti</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">3</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">PERFECTIVE</td>
<td class="inline_code">"3sgp+"</td>
<td><em>lui&nbsp;<span style="text-decoration: underline;">fu</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">1</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">PERFECTIVE</td>
<td class="inline_code">"1ppl+"</td>
<td><em>noi&nbsp;<span style="text-decoration: underline;">fummo</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">2</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">PERFECTIVE</td>
<td class="inline_code">"2ppl+"</td>
<td><em>voi&nbsp;<span style="text-decoration: underline;">foste</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">3</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">PERFECTIVE</td>
<td class="inline_code">"3ppl+"</td>
<td><em>loro&nbsp;<span style="text-decoration: underline;">furono</span></em></td>
</tr>
<tr>
<td style="border-left: 0; border-right: 0; padding: 0;">&nbsp;</td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">1</td>
<td class="inline_code">SG</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1sgp?"</td>
<td><em>io&nbsp;<span style="text-decoration: underline;">fossi</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">2</td>
<td class="inline_code">SG</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2sgp?"</td>
<td><em>tu&nbsp;<span style="text-decoration: underline;">fossi</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">3</td>
<td class="inline_code">SG</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3sgp?"</td>
<td><em>lui&nbsp;<span style="text-decoration: underline;">fosse</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">1</td>
<td class="inline_code">PL</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1ppl?"</td>
<td><em>noi&nbsp;<span style="text-decoration: underline;">fossimo</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">2</td>
<td class="inline_code">PL</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2ppl?"</td>
<td><em>voi&nbsp;<span style="text-decoration: underline;">foste</span></em></td>
</tr>
<tr>
<td class="inline_code">PAST</td>
<td class="inline_code">3</td>
<td class="inline_code">PL</td>
<td class="inline_code">SUBJUNCTIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3ppl?"</td>
<td><em>loro&nbsp;<span style="text-decoration: underline;">fossero</span></em></td>
</tr>
<tr>
<td style="border-left: 0; border-right: 0; padding: 0;">&nbsp;</td>
</tr>
<tr>
<td class="inline_code">FUTURE</td>
<td class="inline_code">1</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1sgf"</td>
<td><em>io&nbsp;<span style="text-decoration: underline;">sarò</span></em></td>
</tr>
<tr>
<td class="inline_code">FUTURE</td>
<td class="inline_code">2</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2sgf"</td>
<td><em>tu&nbsp;<span style="text-decoration: underline;">sarai</span></em></td>
</tr>
<tr>
<td class="inline_code">FUTURE</td>
<td class="inline_code">3</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3sgf"</td>
<td><em>lui&nbsp;<span style="text-decoration: underline;">sarà</span></em></td>
</tr>
<tr>
<td class="inline_code">FUTURE</td>
<td class="inline_code">1</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1plf"</td>
<td><em>noi&nbsp;<span style="text-decoration: underline;">saremo</span></em></td>
</tr>
<tr>
<td class="inline_code">FUTURE</td>
<td class="inline_code">2</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2plf"</td>
<td><em>voi&nbsp;<span style="text-decoration: underline;">sarete</span></em></td>
</tr>
<tr>
<td class="inline_code">FUTURE</td>
<td class="inline_code">3</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3plf"</td>
<td><em>loro&nbsp;<span style="text-decoration: underline;">saranno</span></em></td>
</tr>
<tr>
<td style="border-left: 0; border-right: 0; padding: 0;">&nbsp;</td>
</tr>
<tr>
<td class="inline_code">CONDITIONAL</td>
<td class="inline_code">1</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1sg-&gt;"</td>
<td><em>io&nbsp;<span style="text-decoration: underline;">sarei</span></em></td>
</tr>
<tr>
<td class="inline_code">CONDITIONAL</td>
<td class="inline_code">2</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2sg-&gt;"</td>
<td><em>tu&nbsp;<span style="text-decoration: underline;">saresti</span></em></td>
</tr>
<tr>
<td class="inline_code">CONDITIONAL</td>
<td class="inline_code">3</td>
<td class="inline_code">SG</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3sg-&gt;"</td>
<td><em>lui&nbsp;<span style="text-decoration: underline;">sarebbe</span></em></td>
</tr>
<tr>
<td class="inline_code">CONDITIONAL</td>
<td class="inline_code">1</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"1pl-&gt;"</td>
<td><em>noi&nbsp;<span style="text-decoration: underline;">saremmo</span></em></td>
</tr>
<tr>
<td class="inline_code">CONDITIONAL</td>
<td class="inline_code">2</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"2pl-&gt;"</td>
<td><em>voi&nbsp;<span style="text-decoration: underline;">sareste</span></em></td>
</tr>
<tr>
<td class="inline_code">CONDITIONAL</td>
<td class="inline_code">3</td>
<td class="inline_code">PL</td>
<td class="inline_code">INDICATIVE</td>
<td class="inline_code">IMPERFECTIVE</td>
<td class="inline_code">"3pl-&gt;"</td>
<td><em>loro&nbsp;<span style="text-decoration: underline;">sarebbero</span></em></td>
</tr>
</tbody>
</table>
<p>Instead of optional parameters, a single short alias, or&nbsp;<span class="inline_code">PARTICIPLE</span> or <span class="inline_code">PAST+PARTICIPLE</span> can also be given. With no parameters, the infinitive form of the verb is returned.</p>
<h3>Attributive &amp; predicative adjectives&nbsp;</h3>
<p>Italian adjectives inflect with suffixes&nbsp;<span class="inline_code">-o</span>&nbsp;<span class="inline_code">-i</span>&nbsp;(masculine) and&nbsp;<span class="inline_code">-a</span>&nbsp;<span class="inline_code">-e</span>&nbsp;(feminine), with some exceptions &nbsp;(e.g., <em>grande</em>&nbsp;<em>i grandi felini</em>). You can get the base form with the <span class="inline_code">predicative()</span> function. A statistical approach is used with an accuracy of 88%.</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.it import attributive
&gt;&gt;&gt; print predicative('grandi')
grande </pre></div>
<h3>Parser</h3>
<p>For parsing there is <span class="inline_code" style="font-family: Courier, monospace; font-size: 12px;">parse()</span>, <span class="inline_code">parsetree()</span> and&nbsp;<span class="inline_code" style="font-family: Courier, monospace; font-size: 12px;">split()</span>. The <span class="inline_code">parse()</span> function annotates words in the given string with their part-of-speech <a class="link-maintenance" href="mbsp-tags.html">tags</a>&nbsp;(e.g., <span class="postag">NN</span> for nouns and <span class="postag">VB</span> for verbs). The <span class="inline_code">parsetree()</span> function takes a string and returns a tree of nested objects (<span class="inline_code">Text</span>&nbsp;<span class="inline_code">Sentence</span>&nbsp;<span class="inline_code">Chunk</span>&nbsp;<span class="inline_code">Word</span>). The <span class="inline_code">split()</span> function takes the output of <span class="inline_code">parse()</span> and returns a <span class="inline_code">Text</span>.&nbsp;See the <span class="inline_code">pattern.en</span> documentation&nbsp;(<span class="link-maintenance" style="color: #78aaff;"><a style="color: #8caaff; outline-style: none !important; outline-width: initial !important; outline-color: initial !important;" href="pattern-en.html#tree">here</a></span>) how to manipulate <span class="inline_code">Text</span>&nbsp;objects.&nbsp;</p>
<div class="example">
<pre class="brush:python; gutter:false; light:true;">&gt;&gt;&gt; from pattern.it import parse, split
&gt;&gt;&gt;
&gt;&gt;&gt; s = parse('Il gatto nero faceva le fusa.')
&gt;&gt;&gt; for sentence in split(s):
&gt;&gt;&gt; print sentence
Sentence('Il/DT/B-NP/O gatto/NN/I-NP/O nero/JJ/I-NP/O'
'faceva/VB/B-VP/O'
'le/DT/B-NP/O fusa/NN/I-NP/O ././O/O')
</pre></div>
<p>The parser is mined from Wiktionary.&nbsp;The accuracy is around 92%.</p>
<h3>Sentiment analysis</h3>
<p>There's no&nbsp;<span class="inline_code">sentiment()</span> function for Italian yet.</p>
</div>
</div></div>
</div>
</div>
</div>
</div>
</div>
</div>
<script>
SyntaxHighlighter.all();
</script>
</body>
</html>

Some files were not shown because too many files have changed in this diff Show More

Loading…
Cancel
Save