You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

9.1 KiB

Mashup()

Mashup() is a function that compare two similar texts and produce a third text that randomly choose between the original texts. The outcome is a text with the differences chosen through random choice.

A function that:

  1. takes into account 2 similar texts (example: 2 different translations of a poems)
  2. finds the fixed_words and uses them as the fixed text for the new piece of text
  3. puts the results together into a new piece, randomly choosing the different options
  4. html output that highlights the different random choices of the translations

how to use: to use this function it is necessary to have two texts with the same number of lines (as it goes throught the two texts and compares them line by line). It can be use also with list of strings.

input: 2 texts that are similar but not the exact copy of each other --

output: a new text that showcase the differences // a new text made out of random choice // still not clear yet // extracted pdf/txt file?

In [2]:
# define texts

text1= '''
The glasses were empty
The bottle was shattered
The bed was wide open
The door was tight shuttered
Each shard was a star
Of bliss and of beauty
That flashed on the floor
All dusty and dirty
And I was dead drunk
Lit up wildly ablaze
You were drunk and alive
In a naked embrace!
'''
text2= '''
So the glasses were empty
and the bottle broken
And the bed was wide open
and the door closed
And all of the glass stars
of happiness and beauty 
were sparkling in the dust
of the poorly dusted room.
And I was dead drunk
And I was a bonfire
And you were alive, drunk,
all naked in my arms.
'''
In [ ]:
 
In [3]:
import difflib
from random import choice

def mashup(text1,text2): #take into account 2 texts
    
    text1 = text1.splitlines() #split texts in lines
    text2 = text2.splitlines()
    
    fixed_words= [] #define empty list for fixed_words (words that are the same in both texts) // a list of lists of words
    for line_A, line_B in zip(text1, text2): #start the first loop reading line by line from both texts at the same time (=zip)
        words_A = line_A.split() #split lines in lists of words
        words_B = line_B.split()
    
        d = difflib.Differ() #Differ compare sequences of lines of text, and produce human-readable differences ('+' in text1), ('-' in text2), ('' fixed_Words)
        diff = d.compare(words_A, words_B)  #compare the difference between the two lists of words
        
        
        linelist = [] #define empty list 
        for result in diff: #second loop that goes through all the lines and then the words of both texts simultaneously
            code, word = result.split(' ', 1) #split result of diff in code [('+'), ('-') or ('')] and the resulting word (is it the same or is it just in one of the two texts?)
            word = word.strip() #to be sure it doesn't have any weird things /n at the ends of the lines
            if code == '' : #if the code is ' ' (nothing) it means that the word can be found in both texts
                linelist.append(word) #if this happens, put the corresponding words in the linelist
        fixed_words.append(linelist) #afterwards, add linelist to fixed_words (linelist is inside the loop so all the words in every line are put in there, but fixed_words is outside so that just the words are added just once)
            
    length = len(text1) #define lenght of text1
    for linenumber in range(length): #for the number of the lines in the lenght of the text
        cut_left1 = 0 #the beginning of both texts is position n°0 (on the left side of the lines)
        cut_left2 = 0
        words_1 = text1[linenumber].split() #words_1 is split in words keeping the position in the lines
        words_2 = text2[linenumber].split()    
        if len(fixed_words[linenumber]) > 0: #if the index on the fixed words in the line is more than 0 (it's not the first one)
            for fixed_word in fixed_words[linenumber]: #for all the fixed_words that are in the fixed_words list always following the linenumbers
                cut_right1 = words_1.index(fixed_word) #finding the first fixed_word from the left (beginning / position 0) to the right(end of sentence / last word in the line)
                cut_right2 = words_2.index(fixed_word) #in both texts

                slice_1 =  words_1[cut_left1 : cut_right1] #create slice_1 
                slice_2 =  words_2[cut_left2 : cut_right2]
                print(choice([slice_1, slice_2]))
                
                cut_left1 = cut_right1 #now invert, when it's gone through all the words till finding the last fixed word
                cut_left2 = cut_right2

            slice_1 =  words_1[cut_left1 :] #from the last fixed_word found to the right
            slice_2 =  words_2[cut_left2 :]
            print(choice([slice_1, slice_2])) #choose
        else:
            slice_1 =  words_1[cut_left1 :] #here is doing it outside of the loop ( it gets the last word of the line if it's not a
            slice_2 =  words_2[cut_left2 :]
            print(choice([slice_1, slice_2])) #choose
        print('--------')    
            
In [4]:
mashup(text1,text2)
[]
--------
['So', 'the']
['glasses']
['were']
['empty']
--------
['and', 'the']
['bottle', 'was', 'shattered']
--------
['The']
['bed']
['was']
['wide']
['open']
--------
['and', 'the']
['door', 'was', 'tight', 'shuttered']
--------
['And', 'all', 'of', 'the', 'glass', 'stars']
--------
['of', 'happiness']
['and']
['beauty']
--------
['were', 'sparkling', 'in']
['the', 'dust']
--------
['All', 'dusty', 'and', 'dirty']
--------
[]
['And']
['I']
['was']
['dead']
['drunk']
--------
['And', 'I', 'was', 'a', 'bonfire']
--------
['And', 'you']
['were', 'alive,', 'drunk,']
--------
['In', 'a']
['naked', 'in', 'my', 'arms.']
--------
In [ ]:
 
In [ ]: