You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

439 lines
11 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# MediaWiki API (part 2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook:\n",
"\n",
"* uses the `query` & `parse` actions of the `MediaWiki API`, which we can use to work with wiki pages as (versioned and hypertextual) technotexts\n",
"\n",
"## Epicpedia\n",
"\n",
"Reference: Epicpedia (2008), Annemieke van der Hoek \\\n",
"(from: https://diversions.constantvzw.org/wiki/index.php?title=Eventual_Consistency#Towards_diffractive_technotexts)\n",
"\n",
"> In Epicpedia (2008), Annemieke van der Hoek creates a work that makes use of the underlying history that lies beneath the surface of each Wikipedia article.[20] Inspired by the work of Berthold Brecht and the notion of Epic Theater, Epicpedia presents Wikipedia articles as screenplays, where each edit becomes an utterance performed by a cast of characters (both major and minor) that takes place over a span of time, typically many years. The work uses the API of wikipedia to retrieve for a given article the sequence of revisions, their corresponding user handles, the summary message (that allows editors to describe the nature of their edit), and the timestamp to then produce a differential reading. \n",
"\n",
"![](https://diversions.constantvzw.org/wiki/images/b/b0/Epicpedia_EpicTheater02.png)\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"import urllib\n",
"import json\n",
"from IPython.display import JSON # iPython JSON renderer"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Query & Parse"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will work again with the `Dérive` page on the wiki: https://pzwiki.wdka.nl/mediadesign/D%C3%A9rive (i moved it here, to make the URL a bit simpler)\n",
"\n",
"And use the `API help page` on the PZI wiki as our main reference: https://pzwiki.wdka.nl/mw-mediadesign/api.php"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# query the wiki page Dérive\n",
"request = 'https://pzwiki.wdka.nl/mw-mediadesign/api.php?action=query&titles=D%C3%A9rive&format=json'\n",
"response = urllib.request.urlopen(request).read()\n",
"data = json.loads(response)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"JSON(data)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# parse the wiki page Dérive\n",
"request = 'https://pzwiki.wdka.nl/mw-mediadesign/api.php?action=parse&page=D%C3%A9rive&format=json'\n",
"response = urllib.request.urlopen(request).read()\n",
"data = json.loads(response)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"JSON(data)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Links, contributors, edit history"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can ask the API for different kind of material/information about the page.\n",
"\n",
"Such as:\n",
"\n",
"* a list of wiki links\n",
"* a list of external links\n",
"* a list of images\n",
"* a list of edits\n",
"* a list of contributors\n",
"* page information\n",
"* reverse links (What links here?)\n",
"* ...\n",
"\n",
"We can use the query action again, to ask for these things:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# wiki links: prop=links\n",
"request = 'https://pzwiki.wdka.nl/mw-mediadesign/api.php?action=query&prop=links&titles=D%C3%A9rive&format=json'\n",
"response = urllib.request.urlopen(request).read()\n",
"data = json.loads(response)\n",
"JSON(data)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# external links: prop=extlinks\n",
"request = 'https://pzwiki.wdka.nl/mw-mediadesign/api.php?action=query&prop=extlinks&titles=D%C3%A9rive&format=json'\n",
"response = urllib.request.urlopen(request).read()\n",
"data = json.loads(response)\n",
"JSON(data)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"application/json": {
"batchcomplete": "",
"query": {
"pages": {
"33524": {
"images": [
{
"ns": 6,
"title": "File:Debo 009 05 01.jpg"
},
{
"ns": 6,
"title": "File:Sex-majik-2004.gif"
}
],
"ns": 0,
"pageid": 33524,
"title": "Dérive"
}
}
}
},
"text/plain": [
"<IPython.core.display.JSON object>"
]
},
"execution_count": 3,
"metadata": {
"application/json": {
"expanded": false,
"root": "root"
}
},
"output_type": "execute_result"
}
],
"source": [
"# images: prop=images\n",
"request = 'https://pzwiki.wdka.nl/mw-mediadesign/api.php?action=query&prop=images&titles=D%C3%A9rive&format=json'\n",
"response = urllib.request.urlopen(request).read()\n",
"data = json.loads(response)\n",
"JSON(data)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# edit history: prop=revisions\n",
"request = 'https://pzwiki.wdka.nl/mw-mediadesign/api.php?action=query&prop=revisions&titles=D%C3%A9rive&format=json'\n",
"response = urllib.request.urlopen(request).read()\n",
"data = json.loads(response)\n",
"JSON(data)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# contributors: prop=contributors\n",
"request = 'https://pzwiki.wdka.nl/mw-mediadesign/api.php?action=query&prop=contributors&titles=D%C3%A9rive&format=json'\n",
"response = urllib.request.urlopen(request).read()\n",
"data = json.loads(response)\n",
"JSON(data)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# page information: prop=info\n",
"request = 'https://pzwiki.wdka.nl/mw-mediadesign/api.php?action=query&prop=info&titles=D%C3%A9rive&format=json'\n",
"response = urllib.request.urlopen(request).read()\n",
"data = json.loads(response)\n",
"JSON(data)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# reverse links (What links here?): prop=linkshere + lhlimit=25 (max. nr of results)\n",
"request = 'https://pzwiki.wdka.nl/mw-mediadesign/api.php?action=query&prop=linkshere&lhlimit=100&titles=Prototyping&format=json'\n",
"response = urllib.request.urlopen(request).read()\n",
"data = json.loads(response)\n",
"JSON(data)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Use the `data` responses in Python (and save data in variables)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# For example with the action=parse request\n",
"request = 'https://pzwiki.wdka.nl/mw-mediadesign/api.php?action=parse&page=D%C3%A9rive&format=json'\n",
"response = urllib.request.urlopen(request).read()\n",
"data = json.loads(response)\n",
"JSON(data)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"text = data['parse']['text']['*']\n",
"print(text)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"title = data['parse']['title']\n",
"print(title)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"images = data['parse']['images']\n",
"print(images)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Use these variables to generate HTML pages "
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {},
"outputs": [],
"source": [
"# open a HTML file to write to \n",
"output = open('myfilename.html', 'w')"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2813"
]
},
"execution_count": 58,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# write to this HTML file you just opened\n",
"output.write(text)"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [],
"source": [
"# close the file again (Jupyter needs this to actually write a file)\n",
"output.close()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Use these variables to generate HTML pages (using the template language Jinja)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Jinja (template language): https://jinja.palletsprojects.com/"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 4
}