Chapter 8.2 work with Database

Database

A table is a grid of rows and columns, similar to a spreadsheet. To create a table, name it and specify the order, names, and types of its columns. Each row has the same columns, although a column may be defined to allow missing data (called nulls). In the menu example, you could create a table with one row for each item being sold. Each item has the same columns, including one for the price.

A column or group of columns is usually the table’s primary key; its values must be unique in the table. This prevents adding the same data to the table more than once.
This key is indexed for fast lookups during queries.

If you want to find rows by some non-key column value, define a secondary index on that column. Otherwise, the database server must perform a table scan—a brute-force
search of every row for matching column values.

Tables can be related to each other with foreign keys, and column values can be constrained to these keys.

SQL

There are two main categories of SQL statements:
DDL (data definition language)
Handles creation, deletion, constraints, and permissions for tables, databases, and uses
DML (data manipulation language)
Handles data insertions, selects, updates, and deletions

The main DML operations of a relational database are often known by the acronym CRUD:
• Create by using the SQL INSERT statement
• Read by using SELECT
• Update by using UPDATE
• Delete by using DELETE

DB-API

DB-API is Python’s standard API for accessing relational databases. Using it, you can write a single program that works with multiple kinds of relational databases instead of writing a separate program for each one. It’s similar to Java’s JDBC or Perl’s dbi.

Its main functions are the following:
connect()
Make a connection to the database; this can include arguments such as username, password, server address, and others.

cursor()
Create a cursor object to manage queries.

execute() and executemany()
Run one or more SQL commands against the database.

fetchone(), fetchmany(), and fetchall()
Get the results from execute.

The Python database modules in the coming sections conform to DB-API, often with extensions and some differences in details.

mysql

You can use mysqldb or pymysql module to get the connection to the database and create a connection object, after that the operations are all same:

#!/usr/bin/python
# -*- coding: UTF-8 -*-

import MySQLdb

# connect to database
db = MySQLdb.connect("localhost","testuser","test123","TESTDB" )

# create a cursor object 
cursor = db.cursor()

# use execute SQL statement show which part of data you want to fetch
cursor.execute("SELECT VERSION()")

# Grab a piece of value by  fetchone():
data = cursor.fetchone()

print "Database version : %s " % data

# Disconnect
db.close()

Output：

Database version : 5.0.45

Connect:

There are several ways to connect to the database:

Method 1:

conn=pymysql.connect(host='localhost',user='frankfuclick',passwd='12345678',db='test')

Method 2:

conn=pymysql.connect('localhost','frankfuclick','12345678','test')

Method 3: Another way to connect is to specify the parameters using a Python dictionary, note the quotation around the host, user~.

conn_params={
    "host":'localhost',
    "user":'frankfuclick',
    "passwd":'12345678',
    "db":'test'
    }

conn=pymysql.connect(**conn_params)

Execute():

Syntax:

cursor.execute(operation, params=None, multi=False)
iterator = cursor.execute(operation, params=None, multi=True)

This method executes the given database operation (query or command). The parameters found in the tuple or dictionary params are bound to the variables in the operation. Specify variables using %s or %(name)s parameter style (that is, using format or pyformat style). execute()returns an iterator if multi is True.

Before fetch data, you must execute a query first, Without execute statement, you will get an error:

ProgrammingError: execute() first

E.g.

you have a database table named python:

name	sex	age	income
Frank	m	30	3000
Peter	m	45	4000

Part of the code is:

cur.execute("SELECT user FROM python")

data=cur.fetchall()

print(data)

The result will be

(('Frank',), ('Peter',))

if you change the code into:

cur.execute("SELECT user,income FROM python")

data=cur.fetchall()

print(data)

The result will be

(('frank', 3000.0), ('Peter', 4000.0))

Insert data

import MySQLdb

db = MySQLdb.connect("localhost","testuser","test123","TESTDB" )

cursor=db.cursor()

sql2="""INSERT INTO EMPLOYEE(FIRST_NAME,LAST_NAME,AGE,SEX,INCOME) value ('Sam','Feng',30,'M',4000),('Peter','yu',30,'M',5000)"""
try:
    cursor.execute(sql2)
    db.commit()
except:
    db.rollback()
db.close()

Fetch data from database:

Python uses fetchone() to grab a single row, fetchall() to grab all the rows.

fetchone():
Syntax:
```
row = cursor.fetchone()
```
This method retrieves the next row of a query result set and returns a single sequence, or None if no more rows are available. By default, the returned tuple consists of data returned by the MySQL server, converted to Python objects. If the cursor is a raw cursor, no such conversion occurs.

The fetchone() method is used by fetchall() and fetchmany(). It is also used when a cursor is used as an iterator.

The following example shows two equivalent ways to process a query result. The first uses fetchone() in a while loop, the second uses the cursor as an iterator:
```
#SELECT * means will fetch all the columns
cursor.execute("SELECT * FROM employees")
row = cursor.fetchone()
# Using a while loop
while row is not None:
  print(row)
  row = cursor.fetchone()

# Using the cursor as iterator
cursor.execute("SELECT * FROM employees")
for row in cursor:
  print(row)
```
fetchall():
Syntax:
```
rows = cursor.fetchall()
```
The method fetches all (or all remaining) rows of a query result set and returns a list of tuples. If no more rows are available, it returns an empty list.

The following example shows how to retrieve the first two rows of a result set, and then retrieve any remaining rows:
```
>>> cursor.execute("SELECT * FROM employees ORDER BY emp_no")
>>> head_rows = cursor.fetchmany(size=2)
>>> remaining_rows = cursor.fetchall()
```
fetchmany():
Syntax:
```
rows = cursor.fetchmany(size=1)
```
This method fetches the next set of rows of a query result and returns a list of tuples. If no more rows are available, it returns an empty list.

The number of rows returned can be specified using the size argument, which defaults to one. Fewer rows are returned if fewer rows are available than specified.

You must fetch all rows for the current query before executing new statements using the same connection.
rowcount:

grab all the rows whose income is more than 1000

#!/usr/bin/python
# -*- coding: UTF-8 -*-

import MySQLdb

db = MySQLdb.connect("localhost","testuser","test123","TESTDB" )

cursor = db.cursor()

sql = "SELECT * FROM EMPLOYEE \
       WHERE INCOME > '%d'" % (1000)
try:

   cursor.execute(sql)
   results = cursor.fetchall()
   for row in results:
      fname = row[0]
      lname = row[1]
      age = row[2]
      sex = row[3]
      income = row[4]
      print "fname=%s,lname=%s,age=%d,sex=%s,income=%d" % \
             (fname, lname, age, sex, income )
except:
   print "Error: unable to fecth data"

db.close()

Update data in database

change the age of the male employees’ age to the actual age plus one.

"UPDATE EMPLOYEE SET AGE = AGE + 1 WHERE SEX = '%c'" % ('M')

Delete rows

E.g. delete all rows whose age is bigger than 20.

"DELETE FROM EMPLOYEE WHERE AGE > '%d'" % (20)

SQLite

SQLite is a good, light, open source relational database. It’s implemented as a standard Python library, and stores databases in normal files. These files are portable across machines and operating systems, making SQLite a very portable solution for simple relational database applications. It isn’t as full-featured as MySQL or PostgreSQL, but it does support SQL, and it manages multiple simultaneous users.

You begin with a connect() to the local SQLite database file that you want to use or create. This file is the equivalent of the directory-like database that parents tables in
other servers.

The special string ‘:memory:’ creates the database in memory only; this is fast and useful for testing but will lose data when your program terminates or if your
computer goes down.

For the next example, let’s make a database called enterprise.db and the table zoo to manage our thriving roadside petting zoo business. The table columns are as follows:

critter
A variable length string, and our primary key
count
An integer count of our current inventory for this animal
damages

The dollar amount of our current losses from animal-human interactions

 >>>import sqlite3
 >>> conn = sqlite3.connect('enterprise.db')
 >>> curs = conn.cursor()
 >>> curs.execute('''CREATE TABLE zoo
 (critter VARCHAR(20) PRIMARY KEY,
 count INT,
 damages FLOAT)''')
 <sqlite3.Cursor object at 0x1006a22d0>

Python’s triple quotes are handy when creating long strings such as SQL queries.

Now, add some animals to the zoo:

>>> curs.execute('INSERT INTO zoo VALUES("duck", 5, 0.0)')
 <sqlite3.Cursor object at 0x1006a22d0>
>>> curs.execute('INSERT INTO zoo VALUES("bear", 2, 1000.0)')
 <sqlite3.Cursor object at 0x1006a22d0>

There’s a safer way to insert data, using a placeholder:

>>> ins = 'INSERT INTO zoo (critter, count, damages) VALUES(?, ?, ?)'
>>> curs.execute(ins, ('weasel', 1, 2000.0))
 <sqlite3.Cursor object at 0x1006a22d0>

This time, we used three question marks in the SQL to indicate that we plan to insert three values, and then pass those three values as a list to the execute() function. Placeholders handle tedious details such as quoting. They protect you against SQL injection —a kind of external attack that is common on the Web that inserts malicious SQL
commands into the system.

Now, let’s see if we can get all our animals out again:

 >>> curs.execute('SELECT * FROM zoo')
 <sqlite3.Cursor object at 0x1006a22d0>
 >>> rows = curs.fetchall()
 >>> print(rows)
 [('duck', 5, 0.0), ('bear', 2, 1000.0), ('weasel', 1, 2000.0)]

DBM family

dbm.open(file, flag=’r’, mode=0o666)

Open the database file file and return a corresponding object.

If the database file already exists, the whichdb() function is used to determine its type and the appropriate module is used; if it does not exist, the first module listed above that can be imported is used.

The optional flag argument can be:

Value	Meaning
`'r'`	Open existing database for reading only (default)
`'w'`	Open existing database for reading and writing
`'c'`	Open database for reading and writing, creating it if it doesn’t exist
`'n'`	Always create a new, empty database, open for reading and writing

The optional mode argument is the Unix mode of the file, used only when the database has to be created. It defaults to octal 0o666 (and will be modified by the prevailing umask).

>>> import dbm
>>> db = dbm.open('definitions', 'c')

Key and values are always stored as bytes. This means that when strings are used they are implicitly converted to the default encoding before being stored. To create key-value pairs, just assign a value to a key just as you would a dictionary:

 >>> db['mustard'] = 'yellow'                     // This statement create a key "mustard" with value "yellow"
 >>> db['ketchup'] = 'red'
 >>> db['pesto'] = 'green'

You cannot iterate over the database object db, but you can get the number of keys by using len(). Note that get() and setdefault() work as they do for dictionaries.

gdbm.firstkey(): It’s possible to loop over every key in the database using this method and the nextkey() method. The traversal is ordered by gdbm‘s internal hash values, and won’t be sorted by the key values. This method returns the starting key.

gdbm.nextkey(key)

Returns the key that follows key in the traversal. The following code prints every key in the database db, without having to create a list in memory that contains them all:

k = db.firstkey()
while k != None:
    print(k)
    k = db.nextkey(k)

gdbm.reorganize(): If you have carried out a lot of deletions and would like to shrink the space used by the gdbm file, this routine will reorganize the database. gdbm objects will not shorten the length of a database file except by using this reorganization; otherwise, deleted file space will be kept and reused as new (key, value) pairs are added.

gdbm.sync(): When the database has been opened in fast mode, this method forces any unwritten data to be written to the disk.

gdbm.close(): Close the gdbm database.

Now close, then reopen to see if it actually saved what we gave it:

 >>>db.close()
 >>> db = dbm.open('definitions', 'r')
 >>> db['mustard']
 b'yellow'

Memcached

There are many Python drivers; one that works with Python 3 is python3-memcached,
which you can install by using this command:
$ pip install python-memcached

To use it, connect to a memcached server, after which you can do the following:

• Set and get values for keys
• Increment or decrement a value
• Delete a key

Data is not persistent, and data that you wrote earlier might disappear. This is inherent in memcached, being that it’s a cache server. It avoids running out of memory by discarding old data.

 >>> import memcache
 >>> db = memcache.Client(['127.0.0.1:11211'])
 >>> db.set('marco', 'polo')
 True
 >>> db.get('marco')
 'polo'
 >>> db.set('ducks', 0)
 True
 >>> db.get('ducks')
 0
 >>> db.incr('ducks', 2)
 2
 >>> db.get('ducks')
 2

Redis

Redis can do the following:

• Save data to disk for reliability and restarts
• Keep old data
• Provide more data structures than simple strings

Install and configure redis server:

wget http://download.redis.io/redis-stable.tar.gz
tar xvzf redis-stable.tar.gz
cd redis-stable
make

apt-get install redis-server

Start redis:

$ redis-server

Verify:

$ redis-cli ping
PONG

 >>>import redis
 >>> conn = redis.Redis()

or redis.Redis(‘localhost’) , redis.Redis(‘localhost’,6379)

List all keys (none so far):

>> conn.keys('*')
 []

Set a simple string (key ‘secret’), integer (key ‘carats’), and float (key ‘fever’):

 >>> conn.set('secret', 'ni!')
 True
 >>> conn.set('carats', 24)
 True
 >>> conn.set('fever', '101.5')
 True

Get the values back by key:

>> conn.get(‘secret’)
b’ni!’
>>> conn.get(‘carats’)
b’24’
>>> conn.get(‘fever’)
b’101.5′

Here, the setnx() method sets a value only if the key does not exist:

>> conn.setnx(‘secret’, ‘icky-icky-icky-ptang-zoop-boing!’)
False

The getset() method returns the old value and sets it to a new one at the same time:

>> conn.getset(‘secret’, ‘icky-icky-icky-ptang-zoop-boing!’)
b’ni!

get a substring by using getrange() (as in Python, offset 0=start, -1=end):

>> conn.getrange(‘secret’, -6, -1)
b’boing!’

Replace a substring by using setrange() (using a zero-based offset):

>> conn.setrange(‘secret’, 0, ‘ICKY’)
32
>>> conn.get(‘secret’)
b’ICKY-icky-icky-ptang-zoop-boing!’

Set multiple keys by using mset():

>> conn.mset({‘pie’: ‘cherry’, ‘cordial’: ‘sherry’})
True

Get more than one value at once by using mget():

>> conn.mget([‘fever’, ‘carats’])
[b’101.5′, b’24’]

Delete a key by using delete():

>> conn.delete(‘fever’)
True

Increment the value by using the incr() or incrbyfloat() commands, and decrement with decr():

>>> conn.incr(‘carats’)
25
>>> conn.incr(‘carats’, 10)
35
>>> conn.decr(‘carats’)
34
>>> conn.decr(‘carats’, 15)
19
>>> conn.set(‘fever’, ‘101.5’)
True
>>> conn.incrbyfloat(‘fever’)
102.5
>>> conn.incrbyfloat(‘fever’, 0.5)
103.0

There’s no decrbyfloat(). Use a negative increment to reduce the fever:

>>> conn.incrbyfloat(‘fever’, -2.0)
101.0

Lists

Redis lists can contain only strings. The list is created when you do your first insertion.
Insert at the beginning by using lpush():

>> conn.lpush(‘zoo’, ‘bear’)
1

Insert more than one item at the beginning:

>> conn.lpush(‘zoo’, ‘alligator’, ‘duck’)
3

Insert before or after a value by using linsert():

>> conn.linsert(‘zoo’, ‘before’, ‘bear’, ‘beaver’)
4
>>> conn.linsert(‘zoo’, ‘after’, ‘bear’, ‘cassowary’)
5

Insert at an offset by using lset() (the list must exist already):

>> conn.lset(‘zoo’, 2, ‘marmoset’)
True

Insert at the end by using rpush():

>> conn.rpush(‘zoo’, ‘yak’)
6

Get the value at an offset by using lindex():

>> conn.lindex(‘zoo’, 3)
b’bear’

Get the values in an offset range by using lrange() (0 to -1 for all):

>> conn.lrange(‘zoo’, 0, 2)
[b’duck’, b’alligator’, b’marmoset’]

Trim the list with ltrim(), keeping only those in a range of offsets:

>> >conn.ltrim(‘zoo’, 1, 4)
True

Get a range of values (use 0 to -1 for all) by using lrange():

>>> conn.lrange(‘zoo’, 0, -1)
[b’alligator’, b’marmoset’, b’bear’, b’cassowary’]

Hashes

Redis hashes are similar to Python dictionaries but can contain only strings. Thus, you can go only one level deep, not make deep-nested structures. Here are examples that create and play with a Redis hash called song:
Set the fields do and re in hash song at once by using hmset():

>> conn.hmset(‘song’, {‘do’: ‘a deer’, ‘re’: ‘about a deer’})
True

Set a single field value in a hash by using hset():

>> conn.hset(‘song’, ‘mi’, ‘a note to follow re’)
1

Get one field’s value by using hget():

>> conn.hget(‘song’, ‘mi’)
b’a note to follow re’

Get multiple field values by using hmget():

>>> conn.hmget(‘song’, ‘re’, ‘do’)
[b’about a deer’, b’a deer’]

Get all field keys for the hash by using hkeys():

>> conn.hkeys(‘song’)
[b’do’, b’re’, b’mi’]

Get all field values for the hash by using hvals():

>> conn.hvals(‘song’)
[b’a deer’, b’about a deer’, b’a note to follow re’]

Get the number of fields in the hash by using hlen():

>> conn.hlen(‘song’)
3

Get all field keys and values in the hash by using hgetall():

>> conn.hgetall(‘song’)
{b’do’: b’a deer’, b’re’: b’about a deer’, b’mi’: b’a note to follow re’}

Set a field if its key doesn’t exist by using hsetnx():

> conn.hsetnx(‘song’, ‘fa’, ‘a note that rhymes with la’)
1

Sorted sets

One of the most versatile Redis data types is the sorted set, or zset. It’s a set of unique values, but each value has an associated floating point score. You can access each item by its value or score. Sorted sets have many uses:

• Leader boards
• Secondary indexes
• Timeseries, using timestamps as scores

We’ll show the last use case, tracking user logins via timestamps.

>>> import time
>>> now = time.time()
>>> now
1361857057.576483

Let’s add our first guest, looking nervous:

>> conn.zadd(‘logins’, ‘smeagol’, now)
1

Five minutes later, another guest:

>> conn.zadd(‘logins’, ‘sauron’, now+(5*60))
1

Two hours later:

>> conn.zadd(‘logins’, ‘bilbo’, now+(2*60*60))
1

One day later, not hasty:

>> conn.zadd(‘logins’, ‘treebeard’, now+(24*60*60))
1

In what order did bilbo arrive?

>> conn.zrank(‘logins’, ‘bilbo’)
2

When was that?

>> conn.zscore(‘logins’, ‘bilbo’)
1361864257.576483

Let’s see everyone in login order:

>> conn.zrange(‘logins’, 0, -1)
[b’smeagol’, b’sauron’, b’bilbo’, b’treebeard’]

With their times, please:

>> conn.zrange(‘logins’, 0, -1, withscores=True)
[(b’smeagol’, 1361857057.576483), (b’sauron’, 1361857357.576483),
(b’bilbo’, 1361864257.576483), (b’treebeard’, 1361943457.576483)]

Bits

This is a very space-efficient and fast way to deal with large sets of numbers. Suppose that you have a website with registered users. You’d like to track how often people log in, how many users visit on a particular day, how often the same user visits on following days, and so on. You could use Redis sets, but if you’ve assigned increasing numeric user IDs, bits are more compact and faster.

For this test, we’ll just use three days and a few user IDs:

>>> days = [‘2013-02-25’, ‘2013-02-26’, ‘2013-02-27’] >>> big_spender = 1089
>>> tire_kicker = 40459
>>> late_joiner = 550212

Each date is a separate key. Set the bit for a particular user ID for that date. For example, on the first date (2013-02-25), we had visits from big_spender (ID 1089) and
tire_kicker (ID 40459):

>> conn.setbit(days[0], big_spender, 1)
0
>>> conn.setbit(days[0], tire_kicker, 1)
0

The next day, big_spender came back:

>> conn.setbit(days[1], big_spender, 1)
0

The next day had yet another visit from our friend, big_spender, and a new person whom we’re calling late_joiner:

>> conn.setbit(days[2], big_spender, 1)
0
>>> conn.setbit(days[2], late_joiner, 1)
0

Let’s get the daily visitor count for these three days:

>>for day in days:
… conn.bitcount(day)
…
2 1 2

Did a particular user visit on a particular day?

>> conn.getbit(days[1], tire_kicker)
0

So, tire_kicker did not visit on the second day.
How many users visited every day?

>> conn.bitop(‘and’, ‘everyday’, *days)
68777
>>> conn.bitcount(‘everyday’)
1

I’ll give you three guesses who it was:

>> conn.getbit(‘everyday’, big_spender)
1

Finally, what was the number of total unique users in these three days?

>> conn.bitop(‘or’, ‘alldays’, *days)
68777
>>> conn.bitcount(‘alldays’)
3

Caches and expiration

All Redis keys have a time-to-live, or expiration date. By default, this is forever. We can use the expire() function to instruct Redis how long to keep the key. As is demonstrated here, the value is a number of seconds:

>>>import time
>>> key = ‘now you see it’
>>> conn.set(key, ‘but not for long’)
True
>>> conn.expire(key, 5)
True
>>> conn.ttl(key)
5
>>> conn.get(key)
b’but not for long’
>>> time.sleep(6)
>>> conn.get(key)
>>>

The expireat() command expires a key at a given epoch time. Key expiration is useful to keep caches fresh and to limit login sessions.

Other NoSQL

The NoSQL servers listed here handle data larger than memory, and many of them use multiple computers. Table 8-6 presents notable servers and their Python libraries.

Site	Python API
Cassandra	pycassa
CouchDB	Couchdb-python
Hbase	happybase
Kyoto Cabinet	kyotocabinet
MongoDB	mongodb
Riak	riak-python-client

Full-Text Databases

There’s a special category of databases for full-text search. They index everything, so you can find that poem that talks about windmills and giant wheels of cheese. You can see some popular open source examples, and their Python APIs, in Table below.

Table . Full-text databases

Site	Python API
Solr	SolPython
ElasticSearch	pyes
Sphinx	sphinxapi
Xapian	xappy
Whoosh	written in Python, includes an API