Collective intelligent programming - Chapter Five optimization

Group tour

import time
import random
import math
people=[('seymour','BOS'),
        ('FRANNY','DAL'),
        ('ZOOEY','CAK'),
        ('WALT','MIA'),
        ('buddy','ORD'),
        ('LES','OMA')]
destination='LGA'

Flight data schedule.txt

Start, end, departure time, arrival time, price

Code optimization.py for loading data

flights={}
for line in file('schedule.txt'):
    origin,dest,depart,arrive,price=line.strip().split(',')
    #key:(origin,dest) value:(depart,arrive,price)
    flights.setdefault((origin,dest),[])
    flights[(origin,dest)].append((depart,arrive,int(price)))
def getminutes(t):
    x=time.strptime(t,'%H:%M')
    return x[3]*60+x[4]
    #x[3] is hours; x[4] is minutes

Descriptive title

r=[1,4,3,2,7,3,6,3,2,4,5,3]
def printschedule(r):
    for d in range(len(r)/2):
        name=people[d][0]
        origin=people[d][1]
        out=flights[(origin,destination)][r[2*d]]
        ret=flights[(destination,origin)][r[2*d+1]]
        print '%10s%10s %5s-%5s $%3s %5s-%5s $3s'%(name,origin,out[0],out[1],out[2],ret[0],ret[1],ret[2])

cost function

The smaller the return value, the better

This function examines the total travel cost and the total waiting time of different family members at the airport. If the car is returned after the rental time, a further $50 fine will be imposed.

sol=r
def schedulecost(sol):
    totalprice=0
    latestarrival=0
    earliestdep=24*60
    for d in range(len(sol)/2):
        origin=people[d][1]
        
        outbound=flights[(origin,desitination)][int(sol[2*d])]
        returnf=flights[(desitination,origin)][int(sol[2*d+1])]
        #Total price: round trip
        totalprice+=outbound[2]
        totalprice+=returnf[2]
        #Record the latest arrival time and the earliest departure time
        if latestarrival<getminutes(outbound[1]):latestarrival=getminutes(outbound[1])
        if earliestdep>getminutes(returnf[0]):earliestdep=getminutes(returnf[0])
    #Everyone has to wait at the airport until the last one arrives
    #They must also arrive at the same time and wait for their return flight
    totalwait=0
    for d in range(len(sol)/2):
        origin=people[d][1]
        outbound=flights[(origin,desitination)][int(sol[2*d])]
        returnf=flights[(desitination,origin)][int(sol[2*d+1])]
        totalwait+=latestarrival-getminutes(outbound[1])
        totalwait+=getminutes(returnf[0])-earliestdep
    if latestarrival>earliestdep: totalprice+=50
    return totalprice+totalwait

random search

It is the baseline for us to evaluate other algorithms.

domain is the sum of flight numbers of each person's round-trip flight, and the flight number is the number between (0,9).

domain=[(0,9)]*(len(optimization.people)*2)
def randomoptimize(domain,costf):
    best=99999999
    bestr=None
    for i in range(1000):
    #r=[1,4,3,2,7,3,6,3,2,4,5,3]
       r=[random.randint(domain[i][0],domain[i][1]) for i in range(len(domain))]
       cost=costf(r)
       if cost<best:
          best=cost
          bestr=r
    return r

Mountain climbing method

It is very inefficient to try all kinds of solutions at random, and the optimal solutions that have been found are not fully utilized.
Start with a random travel arrangement scheme, and then find all the arrangements adjacent to it, that is, find all the arrangements that can make everyone travel earlier or later than the original random arrangement. When we calculate the cost of the adjacent time arrangement, the arrangement of the lowest cost will become a new problem. Until there is no arrangement to improve the cost.

domain is the sum of flight numbers of each person's round-trip flight, and the flight number is the number between (0,9).

def hillclimb(domain,costf):
    #Create a random solution SOL similar to r
    sol=[random.randint(domain[i][0],domain[i][1]) for i in range(len(domain))]
    #Main cycle
    while 1:
       #Create a list of adjacent solutions
       neighbors=[]
       for j in range(len(domain)):
           #Deviation from original value in each direction
          if sol[j]>domain[j][0]:
          #A string of r is passed in, but sol[j] is changed
              neighbors.append(sol[0:j]+[sol[j]-1]+sol[j+1:])
          if sol[j]<domain[j][0]:
          #A string of r is passed in, but sol[j] is changed
              neighbors.append(sol[0:j]+[sol[j]+1]+sol[j+1:])
        #Finding the best solution in the neighborhood
        current=costf(sol)
        best=current
        for j in range(len(neighbors)):
             cost=costf(neighbors[j])
             if cost<best:
                best=cost
                sol=neighbors[j]
            if best==current:
                break
    return sol

The output sol is the optimal initialization r flight table

sol=optimization.hillclimb(domain,optimization.schedulecost)
optimization.schedulecost(sol)
#Print flight schedule
#sol=[1,4,3,2,7,3,6,3,2,4,5,3]
optimization.printschedule(sol)

The final result may be the local optimal solution rather than the global optimal solution.

Simulated annealing

It can avoid falling into the local optimal solution.

In some cases, it is necessary to turn to a worse explanation before we can get a better one. Simulated annealing algorithm not only because it will accept a better solution, but also because it will accept a lower performance value at the beginning of the annealing process. With the continuous process of annealing, it is more and more impossible for the algorithm to accept poor solutions, until the end, it will only accept better solutions.
The algorithm will only tend to a slightly worse solution rather than a very bad one.

def annealingoptimize(domain,costf,T=10000,cool=0.95.step=1):
     #Random initialization value
     vec=[float(random.randint(domain[i][0],domain[i][1])) for i in range(len(domain))]
     while T>0.1:
         i=random.randint(0,len(domain)-1)
         #Select a direction to change the index
         dir=random.randint(-step,step)
         
         vecb=vec[:]
         vecb[i]+=dir
         if vecb[i]<domain[i][0]:vecb[i]=domain[i][0]
         elif vecb[i]>domain[i][1]: vecb[i]=domain[i][0]
         
         ea=costf(vec)
         eb=costf(vecb)
         
         if(eb<ea or random.random()<pow(math.e,-(eb-ea)/T))
                vec=vecb
         T=T*cool
    return vec

sol=optimization.annealingoptimize(domain,optimization.schedulecost)
optimization.schedulecost(sol)
#Print flight schedule
#sol=[1,4,3,2,7,3,6,3,2,4,5,3]
optimization.printschedule(sol)

genetic algorithm

First, we randomly generate a set of solutions, which we call population.

At each step of the optimization, the algorithm will calculate the cost function of the whole population, so as to get an ordered list of solutions.

After sorting the solutions, a new population is created.

We add the solution at the top of the current solution to the new species group. It is called the elite selection method.

The remaining part of the new population is composed of a new solution formed by modifying the optimal solution.
There are two ways to modify the solution:
Variation: change a number.
Cross: to combine in some way.

Repeat

def geneticoptimize(domain,costf,popsize=50,size=1,mutprob=1,mutprob=0.2,elite=0.2,maxiter=100):
    #Mutation operation, changing a number
    def mutate(vec):
       i=random.randint(0,len(domain)-1)
       if random.random()<0.5 and vec[i]>domain[i][0]:
            return vec[0:i]+[vec[i]-step]+vec[i+1:]
        elif vec[i]<domain[i][1]:
            return vec[0:i]+[vec[i]+step]+vec[i+1:]
    #Crossover operation
    def crossover(r1,r2):
        i=random.random(1,len(domain)-2)
        return r1[0:i]+r2[i:]
    #Construct initial population
    pop=[]
    for i in range(popsize):
        vec=[random.randint(domain[i][0],domain[i][1]) for i in range(len(domain))]
        pop.append(vec)
    #Winner
    topelite=int(elite*popsize)
    #Main cycle
    for i in range(maxiter):
         scores=[(costf(v),v) for v in pop]
         scores.sort()
         ranked=[v for (s,v) in scores]
         
         pop=ranked[0:topelite]
         #Winner after adding mutation and pairing
         while len(pop)<popsize:
             if random.randint()<mutprob:
                 c=random.random(0,topelite)
                 pop.append(mutate(ranked[c]))
             else:#overlapping
                c1=random.randint(0,topelite)
                c2=random.random(0,topelite)
                pop.append(crossover(ranked[c1],ranked[c2]))
    #Print current best, cost
         print scores[0][0]
    return scores[0][1]#Print population, i.e. optimal schedule

popsize: population size

mutprob: the probability that new members in the population are obtained by variation rather than cross

elite: the part considered the optimal solution and allowed to pass on to the next generation

maxtrix: running generations

sol=optimization.geneticoptimize(domain,optimization.schedulecost)
optimization.schedulecost(sol)
#Print flight schedule
#sol=[1,4,3,2,7,3,6,3,2,4,5,3]
optimization.printschedule(sol)

Real flight search

Kayak API

Get the xml interface of Kayak

minidom package

A standard way to look at xml documents as an object tree. The package takes an open xml file as input and then returns an object that can be used to easily extract information.

Flight search

Create a new file called kayak.py:
Write the code to get a new kayak session by using the developer key, parse the xml file to get the content of sid tag.

import time
import urllib2
import xml.dom.minidom
kayakkey='YOURKEYHERE'
def getkayaksession():
    url='http://www.kayak.com/k/ident/apisession?token%s&version=1'%kayakkey
    doc=xml.dom.minidom.parseString(urllib2.urlopen(url).read())
    sid=doc.getElementsByTagName('sid')[0].firstChild.data
    return sid

The optimization of students' dormitory

slightly

Network visualization