You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
structure_trees/a_tree_scraping_from_web.ipynb

417 lines
15 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"id": "9da34557-037a-436f-bbaa-d332057cbe18",
"metadata": {},
"source": [
"a tree extending to the web, pasted with images of the named fruit. the process is done by making a request to the web and scraping an image returned from the request. \n",
"\n",
"modified from the hanging tree program "
]
},
{
"cell_type": "code",
"execution_count": 129,
"id": "01acaa97-796a-4e34-9ee3-a46e71493d9d",
"metadata": {},
"outputs": [],
"source": [
"fruit_list = [\"apricot\",\"blood orange\",\"currant\",\"durian\",\"egg fruit\",\"fig\",\"guava\",\n",
" \"hawthorne\",\"jujube\",\"kiwi\",\"lychee\",\"mandarin\",\"nectarine\",\"olive\",\"persimmon\",\"quandong\",\"rambutan\",\"star fruit\",\n",
" \"tangor\",\"ugli fruit\",\"vanilla\",\"water chestnut\",\"ximenia\",\"yuzu\",\"zhe\"]\n",
"# additionally: longan yumberry sugarcane "
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "a827df76-4293-4c7f-9f65-7c9c3e4f6b4f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"https://media.geeksforgeeks.org/wp-content/cdn-uploads/write_ndi_20210312.svg\n",
"https://media.geeksforgeeks.org/wp-content/cdn-uploads/practice_ndi_20210312.svg\n",
"https://media.geeksforgeeks.org/wp-content/cdn-uploads/premium_ndi_20210312.svg\n",
"https://media.geeksforgeeks.org/wp-content/cdn-uploads/jobs_ndi_20210312.svg\n",
"https://media.geeksforgeeks.org/wp-content/cdn-uploads/20220228124519/Artboard-6-min.png\n",
"https://videocdn.geeksforgeeks.org/geeksforgeeks/GunDetectionUsingPythonOpenCV/gundetectionusingpython20220310130800-small.png\n",
"https://videocdn.geeksforgeeks.org/geeksforgeeks/HowtoPerformCRUDOperationsinFlutterApplicationusingMySQLDatabase/HowtoPerformCRUDOperationsinFlutterApplicationusingMySQLDatabase20220304112711.jpg\n",
"https://videocdn.geeksforgeeks.org/geeksforgeeks/ImplementingSimpleandCustomAlertDialoginFlutterApplication/AlertDialoginFlutterApplication20220303121322-small.png\n",
"https://videocdn.geeksforgeeks.org/geeksforgeeks/BuildaSimpleStopwatchApplicationinFlutter/BuildaSimpleStopwatchApplicationinFlutter20220302114511-small.png\n",
"https://videocdn.geeksforgeeks.org/geeksforgeeks/MLTreatingCategoricalDataImplementationinPython/TreatingCategoricalDataImplementationinPython20220301102144-small.png\n",
"https://videocdn.geeksforgeeks.org/geeksforgeeks/GettingStartedwithMaterialDesigninFlutter/MaterialDesigninFlutter20220228105528-small.png\n",
"https://videocdn.geeksforgeeks.org/geeksforgeeks/HowtoBuildaToDoApplicationinFlutter/HowtoBuildaToDoApplicationinFlutter20220226163539-small.png\n",
"https://videocdn.geeksforgeeks.org/geeksforgeeks/HowtoUseGoogleSigninWithFirebaseinFlutter/HowtoUseGoogleSigninWithFirebaseinFlutter20220225131450-small.png\n",
"https://videocdn.geeksforgeeks.org/geeksforgeeks/ImplementingSwipetoUnlockFeatureinFlutter/HowtoImplementSwipetoUnlockFeatureinFlutterApplication20220224115856-small.png\n",
"https://videocdn.geeksforgeeks.org/geeksforgeeks/MachineLearningHandlingCategoricalData/MachineLearningHandlingCategoricalData20220223123041-small.png\n",
"https://media.geeksforgeeks.org/wp-content/post-ads-banner/2021-12-29-11-18-16-DSA_Ad_icon (1).png\n",
"https://media.geeksforgeeks.org/wp-content/post-ads-banner/2021-12-29-16-30-50-CIP_Icon.png\n",
"https://media.geeksforgeeks.org/wp-content/post-ads-banner/2021-12-29-11-27-51-SD Icon.png\n",
"\n"
]
}
],
"source": [
"# for each word in list, make a request to search that keyword in a search engine\n",
"import requests\n",
"from bs4 import BeautifulSoup\n",
"\n",
"def getdata(url):\n",
" r = requests.get(url)\n",
" return r.text\n",
"\n",
"htmldata = getdata(\"https://www.geeksforgeeks.org/\")\n",
"soup = BeautifulSoup(htmldata, 'html.parser')\n",
"for item in soup.find_all('img'):\n",
" print(item['src'])\n"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "df31e89e-7a79-41f7-b5a0-bbca56374e41",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"https://duckduckgo.com/?q=apple&atb=v315-5&iar=images&iax=images&ia=images\n"
]
}
],
"source": [
"# Import the beautifulsoup\n",
"# and request libraries of python.\n",
"import requests\n",
"import bs4\n",
"\n",
"# Make two strings with default google search URL\n",
"# 'https://google.com/search?q=' and\n",
"# our customized search keyword.\n",
"# Concatenate them\n",
"text= \"apple\"\n",
"url = 'https://duckduckgo.com/?q='+ text + '&atb=v315-5&iar=images&iax=images&ia=images' \n",
"print(url)\n",
"#url = 'https://duckduckgo.com/?q=apple&atb=v315-5&iar=images&iax=images&ia=images'\n",
"# Fetch the URL data using requests.get(url),\n",
"# store it in a variable, request_result.\n",
"#request_result=requests.get( url )\n",
"\n",
"# Creating soup from the fetched request\n",
"#soup = bs4.BeautifulSoup(request_result.text,\n",
" # \"html.parser\")\n",
"#print(soup)\n"
]
},
{
"cell_type": "code",
"execution_count": 130,
"id": "08ab8673-9b5c-4bd0-9fac-72796e831b94",
"metadata": {},
"outputs": [],
"source": [
"# build the fruit motley tree with extra utilities than the letter tree"
]
},
{
"cell_type": "code",
"execution_count": 131,
"id": "9c7187e4-0c49-4908-a169-775e6e475f94",
"metadata": {},
"outputs": [],
"source": [
"class letterLeaf:\n",
" def __init__(self,letter,wordFruit):\n",
" self.leftAlphabet = None\n",
" self.rightAlphabet = None\n",
" self.letter = letter\n",
" # try using a list structure to contain the words in this node? \n",
" self.wordFruit = wordFruit"
]
},
{
"cell_type": "code",
"execution_count": 132,
"id": "11dbf280-6c61-4020-bfe7-e85a723697db",
"metadata": {},
"outputs": [],
"source": [
"# printing tree utility \n",
"# this segment is modified from Shubham Singh(SHUBHAMSINGH10)'s contribution \n",
"\n",
"# spacer\n",
"COUNT = [10]\n",
"\n",
"# print a flat lying tree\n",
"# speculation this is a recursion that prints the right leaf until there is nothing left\n",
"def print2DUtil_flat(root, space) :\n",
" # Base case\n",
" if (root == None) :\n",
" return\n",
" # Increase distance between levels\n",
" space += COUNT[0]\n",
" # Process right leaf/branch/child first\n",
" print2DUtil_flat(root.rightAlphabet, space)\n",
" print()\n",
" for i in range(COUNT[0], space):\n",
" print(end = \" \")\n",
" print(root.letter)\n",
" \n",
" for i in range(COUNT[0], space):\n",
" print(end = \" \")\n",
" #print(root.letter) \n",
" print(root.wordFruit)\n",
" # Process left child\n",
" print2DUtil_flat(root.leftAlphabet, space)\n",
"\n",
" # Wrapper over print2DUtil()\n",
"def print2D(root) :\n",
" #Pass initial space count as 0\n",
" print(\"here is a tree that's laying on the ground: \")\n",
" print2DUtil_flat(root, 0)\n"
]
},
{
"cell_type": "code",
"execution_count": 133,
"id": "d0bcd376-491e-48e9-8bd9-e10d91346d7f",
"metadata": {},
"outputs": [],
"source": [
"#the input was for an interactive version like text input used by wang, save for later \n",
"def grepFirstLetter(word):\n",
" #word = input()\n",
" firstLetter = word[0]\n",
" return firstLetter\n",
" #print(\"the letter starts with : {}, and will be inserted under the {} leaf\".format(firstLetter, firstLetter))"
]
},
{
"cell_type": "code",
"execution_count": 134,
"id": "7a12a00f-f7b0-4234-a06c-54c6f3d1daf1",
"metadata": {},
"outputs": [],
"source": [
"# it will be parsed from the fruit basket\n",
"# pick a fruit\n",
"# hang onto tree\n",
"# parse the string letter by using the grepFirstLetter\n",
"def insertLeaf(root,wordFruit):\n",
" #create new leaf \n",
" letter = grepFirstLetter(wordFruit)\n",
" #print(\"first letter of {} is : {} \".format(wordFruit, letter))\n",
" #creating a new node containing firstLetter and wordFruit\n",
" newleaf = letterLeaf(letter,wordFruit)\n",
" #print(\"test print attributes {} {}\".format(newleaf.letter, newleaf.wordFruit))\n",
" # python pointer implementation\n",
" # a root pointer \n",
" x = root\n",
" # pointer y maintains the trailing\n",
" # pointer of x\n",
" # Pointer to start traversing from root\n",
" # and traverses downward path to search\n",
" # where the new node to be inserted\n",
" x = root\n",
"\n",
" # Pointer y maintains the trailing\n",
" # pointer of x\n",
" y = None\n",
"\n",
" while (x != None):\n",
" y = x\n",
" if (letter < x.letter):\n",
" x = x.leftAlphabet\n",
" else:\n",
" x = x.rightAlphabet\n",
" \n",
" # If the root is None i.e the tree is\n",
" # empty. The new node is the root node\n",
" if (y == None):\n",
" y = newleaf\n",
"\n",
" # If the new key is less then the leaf node key\n",
" # Assign the new node to be its left child\n",
" elif (letter < y.letter):\n",
" y.leftAlphabet = newleaf\n",
"\n",
" # else assign the new node its\n",
" # right child\n",
" else:\n",
" y.rightAlphabet = newleaf\n",
"\n",
" # Returns the pointer where the\n",
" # new node is inserted\n",
" return y\n",
"\n",
"\n",
"# A utility function to do inorder\n",
"# traversal of BST"
]
},
{
"cell_type": "code",
"execution_count": 135,
"id": "dc2230e9-0831-4e3c-93b8-c96d20fd0525",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"here is a tree that's laying on the ground: \n",
"\n",
" z\n",
" zhe\n",
"\n",
" y\n",
" yuzu\n",
"\n",
" x\n",
" ximenia\n",
"\n",
"w\n",
"water chestnut\n",
"\n",
" v\n",
" vanilla\n",
"\n",
" u\n",
" ugli fruit\n",
"\n",
" t\n",
" tangor\n",
"\n",
" s\n",
" star fruit\n",
"\n",
" r\n",
" rambutan\n",
"\n",
" q\n",
" quandong\n",
"\n",
" p\n",
" persimmon\n",
"\n",
" o\n",
" olive\n",
"\n",
" n\n",
" nectarine\n",
"\n",
" m\n",
" mandarin\n",
"\n",
" l\n",
" lychee\n",
"\n",
" k\n",
" kiwi\n",
"\n",
" j\n",
" jujube\n",
"\n",
" h\n",
" hawthorne\n",
"\n",
" g\n",
" guava\n",
"\n",
" f\n",
" fig\n",
"\n",
" e\n",
" egg fruit\n",
"\n",
" d\n",
" durian\n",
"\n",
" c\n",
" currant\n",
"\n",
" b\n",
" blood orange\n",
"\n",
" a\n",
" apricot\n"
]
}
],
"source": [
"# same deal, insert everything in the list until it's empty \n",
"import random\n",
"\n",
"root = None\n",
"# pick a random letter in the alphabet\n",
"random_fruit = random.choice(fruit_list)\n",
"#print(random_letter)\n",
"#insert it into the tree, insert the first one \n",
"root = insertLeaf(root, random_fruit)\n",
"# remove that letter from list\n",
"fruit_list.remove(random_fruit)\n",
"#print(fruit_list)\n",
"len_list = (len(fruit_list))\n",
"#print(len_list)\n",
"while len_list > 0:\n",
" random_fruit = random.choice(fruit_list)\n",
" insertLeaf(root,random_fruit)\n",
" fruit_list.remove(random_fruit)\n",
" #print(\"inserting and removing letter {} \".format(random_letter))\n",
" len_list -= 1\n",
"# keep inserting until the list is empty \n",
"# print tree \n",
"print2D(root)\n",
"# can try multiple times for different tree configurations\n"
]
},
{
"cell_type": "code",
"execution_count": 136,
"id": "6fc4208f-8088-476d-b20d-0f0ac0e85066",
"metadata": {},
"outputs": [],
"source": [
"# fruits in structured presetations:\n",
"# https://zhuanlan.zhihu.com/p/113457497\n",
"# https://www.wordmom.com/fruits/that-start-with-w"
]
},
{
"cell_type": "markdown",
"id": "5b11e6fc-0443-4373-8530-5f2b2f1b0aa7",
"metadata": {},
"source": [
"During a potluck dinner in Beijing Adel brought an dish made from pomegrante seeds. It was in December, the crowd was not used to the fruit salad dish. Adel was the only Iranian there. A talented cook as Adel was, the dish was barely touched. \n",
"Adel, I think you would agree with me that international potlucks are as bad as they can be. Let's hang the fruits high up - trees are good to store and access memories. For the pomegrantes seeds that I've missed that evening. "
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}