From 29636e5764682c5bcdaad3e413f26d6fe6db4710 Mon Sep 17 00:00:00 2001 From: rita Date: Mon, 17 Aug 2020 20:50:15 +0200 Subject: [PATCH] Upload files to '' --- README.md | 9 +++++++++ 1 file changed, 9 insertions(+) create mode 100644 README.md diff --git a/README.md b/README.md new file mode 100644 index 0000000..58deec4 --- /dev/null +++ b/README.md @@ -0,0 +1,9 @@ +# Categorisation of text files + +The actions of categorising and cataloging happen in the most mundane activities, but they are not innocent. They translate values and certain visions of the world. + +In the Rietveld Academy Library, we saw how the librarians are challenging the Library of Congress classification. With Dušan we browsed in the Monoskop Index, an interesting combination of a “book index, library catalog, and tag cloud”. + +With this script, I was experimenting with an automated classification of text files. The script searches for the three most common words in the text and tries to match these words to a category. For example, if one of the most common words is “books” the category of the text is considered “Library Studies”. The same would happen with the word “archives”, “author”, “bibliographic”, “bibliotheca”, “book”, “bookcase”, etc. The script only has one category right now, but it would be easy to add more. By doing so, I would be making associations that are very personal, sometimes inaccurate, and I would be creating a bias in the catalog. + +![Stopwords image](https://git.xpub.nl/rita/categorization_of_files/raw/branch/master/1600px-Common_words.png "Stopwords")