Share this page : facebooktwitterlinkedinmailfacebooktwitterlinkedinmail
Monitor the difference between two files

 

difflib can be used to compare the differences between two file, and the lines with differences will be displayed line by line.

This module would be pretty useful to compare the configuration file of two different version for troubleshooting.

import difflib
text1 = """text1: #定义字符串1
This module provides classes and functions for comparing sequences.
including HTML and context and unified diffs.
difflib document v7.4
add string
"""
text1_lines = text1.splitlines( ) #以行进行分隔, 以便进行对比
text2 = """text2: #定义字符串2
This module provides classes and functions for Comparing sequences.
including HTML and context and unified diffs.
difflib document v7.5"""
text2_lines = text2.splitlines( )
d = difflib.Differ( ) #创建Differ( ) 对象
diff = d.compare( text1_lines, text2_lines) # 采用compare方法对字符串进行比较
print '\n'.join( list( diff) )

Output:

- text1: 

?     ^

+ text2: 

?     ^

- This module provides classes and functions for comparing sequences.

?                                                ^

+ This module provides classes and functions for Comparing sequences.

?                                                ^

  including HTML and context and unified diffs.

- difflib document v7.4

?                     ^

+ difflib document v7.5

?                     ^

- add string

Meaning of Symbols :

  • – : In first line but not in second line.
  • + : in second line but not in first line.
  • ^: The different string
  • ? : some adding content exist

To create a more readable html doc:

Change the last 3 lines into:

import codecs

difference_html_utf=codecs.encode(difference_html,'utf')

fout=open('difference.html','wt')

fout.write(str(difference_html_utf))

fout.close()

 

filecmp

 

Filecmp can compare folder, subfolder, and files, this can be useful for audit and compare backup.

Python 2.3 or later include filecmp by default.

Methods:

 

1.  cmp: compare single file

syntax: filecmp.cmp(f1,f2[,shallow])

If file f1 and file f2 are same, the method return value True, if not same, then return False.

shallow: Default value is True, which means only use os.stat() return the file info of the two files and compare them, don’t compare the content of the files.

The File info includes last access time, modify time, time of the state change.

2. cmpfiles: compare multiple files

Syntax: filecmp.cmpfiles(dir1,dir2, common [,shallow])

This method will compare all the files under dir1 and dir2 and return three lists, which are match, does not match, and error. note that the error list will include the file that not exist , does not have read permission, or other reason .

3. dircmp: compare directory

Three report method:

  • .report() : compare the target folder.

The most general report is use object.report()

import filecmp

result_dir=filecmp.dircmp("dir a","dir b")

print(result_dir.report())

 

  • .report_partial_closure() : only compare the target folder and 1 level subfolder.
  • .report_full_closure() : compare all the subfolders recursively.

Other property:

left
The directory a.
right
The directory b.
left_list
Files and subdirectories in a, filtered by hide and ignore.
right_list
Files and subdirectories in b, filtered by hide and ignore.
common
Files and subdirectories in both a and b.
left_only
Files and subdirectories only in a.
right_only
Files and subdirectories only in b.
common_dirs
Subdirectories in both a and b.
common_files
Files in both a and b.
common_funny
Names in both a and b, such that the type differs between the directories, or names for which os.stat() reports an error.
same_files
Files which are identical in both a and b, using the class’s file comparison operator.
diff_files
Files which are in both a and b, whose contents differ according to the class’s file comparison operator.
funny_files
Files which are in both a and b, but could not be compared.
subdirs
A dictionary mapping names in common_dirs to dircmp objects

 

E. g.

import filecmp

result_dir=filecmp.dircmp("dir1","dir2")

result_dir.report( )

result_dir.report_partial_closure( )

result_dir.report_full_closure( )

print ("left_list: "+ str( result_dir.left_list))

print ("right_list: "+ str( result_dir.right_list))

print ("common: "+ str( result_dir.common))

print ("left_only: "+ str( result_dir.left_only))

print ("right_only: "+ str( result_dir.right_only))

print ("common_dirs: "+ str( result_dir.common_dirs))

print ("common_files: "+ str( result_dir.common_files))

print ("common_funny: "+ str( result_dir.common_funny))

print ("same_file: "+ str( result_dir.same_files))

print ("diff_files: "+ str( result_dir.diff_files))

print ("funny_files: "+ str( result_dir.funny_files))

Output:

diff dir1 dir2

Only in dir2 : ['.DS_Store']

Identical files : ['file2.pdf']

Differing files : ['file1.doc']

diff dir1 dir2

Only in dir2 : ['.DS_Store']

Identical files : ['file2.pdf']

Differing files : ['file1.doc']

diff dir1 dir2

Only in dir2 : ['.DS_Store']

Identical files : ['file2.pdf']

Differing files : ['file1.doc']

left_list: ['file1.doc', 'file2.pdf']

right_list: ['.DS_Store', 'file1.doc', 'file2.pdf']

common: ['file1.doc', 'file2.pdf']

left_only: []

right_only: ['.DS_Store']

common_dirs: []

common_files: ['file1.doc', 'file2.pdf']

common_funny: []

same_file: ['file2.pdf']

diff_files: ['file1.doc']

funny_files: []
smtplib

 

#coding: utf-8
from cStringIO import StringIO
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from email.header import Header
from email import Charset
from email.generator import Generator
import smtplib

# Example address data
from_address = [u'Frank', '[email protected]']
recipient = [u'Frank too', '[email protected]']
subject = u'Unicode test'

# Example body
html = u'Unicode?\nTest?'
text = u'Unicode?\nTest?\nTest from Frank, just ignore it'

# Default encoding mode set to Quoted Printable. Acts globally!
Charset.add_charset('utf-8', Charset.QP, Charset.QP, 'utf-8')

# 'alternative’ MIME type – HTML and plain text bundled in one e-mail message
msg = MIMEMultipart('alternative')
msg['Subject'] = "%s" % Header(subject, 'utf-8')
# Only descriptive part of recipient and sender shall be encoded, not the email address
msg['From'] = "\"%s\" <%s>" % (Header(from_address[0], 'UTF-8'), from_address[1])
msg['To'] = "\"%s\" <%s>" % (Header(recipient[0], 'UTF-8'), recipient[1])

# Attach both parts
htmlpart = MIMEText(html, 'html', 'UTF-8')
textpart = MIMEText(text, 'plain', 'UTF-8')
msg.attach(htmlpart)
msg.attach(textpart)

# Create a generator and flatten message object to 'file’
str_io = StringIO()
g = Generator(str_io, False)
g.flatten(msg)
# str_io.getvalue() contains ready to sent message

# Optionally - send it – using python's smtplib
# or just use Django's
s = smtplib.SMTP('smtp.gmail.com', 587)
#if you don't want to encrypt the email, comment the following 3 lines out
s.ehlo()
s.starttls()
s.ehlo()
s.login("<user_name>", "<password>")
s.sendmail(from_address[1], recipient[1], str_io.getvalue())

 

Pycurl

Before install the pycurl, you need a to install libcurl4-openssl-dev first:

sudo apt-get install libcurl4-openssl-dev

Then install the pycurl

sudo pip install pycurl
import os, sys
import time
import sys
import pycurl
URL=raw_input("Type a website address first: ")  # read website from user
c = pycurl.Curl()  #create a Curl Object
c.setopt(pycurl.URL, URL)    #Set the URL
c.setopt(pycurl.CONNECTTIMEOUT, 5) # set the timeout value for connection, in seconds
c.setopt(pycurl.TIMEOUT, 5)   # set the timeout value for request connection, in seconds
c.setopt(pycurl.NOPROGRESS, 1)  # don't show the progress bar, 0 means show the progress bar, other value means don't show the progress bar.
c.setopt(pycurl.FORBID_REUSE, 1) #Disconnect session after the 
c.setopt(pycurl.MAXREDIRS, 1)   #Maximum number of Http redirection
c.setopt(pycurl.DNS_CACHE_TIMEOUT, 30)   #Time to save the DNS cache, in seconds.

indexfile = open(os.path.dirname(os.path.realpath(__file__) )
+"/content.txt", "wb")
c.setopt(pycurl.WRITEHEADER, indexfile)  #Save the http header to the file saved in variable indexfile.

c.setopt(pycurl.WRITEDATA, indexfile)  #Save the http data to the file saved in variable indexfile.

try:
    c.perform()  # send the request
except Exception, e:   # exception handle
    print("connecion error: "+str(e))
    indexfile.close()
    c.close()
    sys.exit()
NAMELOOKUP_TIME = c.getinfo(c.NAMELOOKUP_TIME) # get the DNS Resolving time
CONNECT_TIME = c.getinfo(c.CONNECT_TIME)   # Time to establish connection
PRETRANSFER_TIME = c.getinfo(c.PRETRANSFER_TIME)
STARTTRANSFER_TIME = c.getinfo(c.STARTTRANSFER_TIME)

TOTAL_TIME = c.getinfo(c.TOTAL_TIME)
HTTP_CODE = c.getinfo(c.HTTP_CODE)
SIZE_DOWNLOAD = c.getinfo(c.SIZE_DOWNLOAD)
HEADER_SIZE = c.getinfo(c.HEADER_SIZE)
SPEED_DOWNLOAD=c.getinfo(c.SPEED_DOWNLOAD)

print("HTTPstatus code: %s" %(HTTP_CODE))
print("DNS Resolving time: %.2f ms"%(NAMELOOKUP_TIME*1000))
print("Time to establish connection: %.2f ms" %(CONNECT_TIME*1000))
print("Time to Prepar for connection: %.2f ms" %(PRETRANSFER_TIME*1000))
print("Time to start the transfer: %.2f ms" %(STARTTRANSFER_TIME*1000))
print("Total transfer time: %.2f ms" %(TOTAL_TIME*1000))
print("Size downloaded: %d bytes/s" %(SIZE_DOWNLOAD))
print("HTTP header size: %d byte" %(HEADER_SIZE))
print("Average download speed: %d bytes/s" %(SPEED_DOWNLOAD))

indexfile.close()
c.close()