1
|
- Kevin Leyton-Brown
- Eugene Nudelman
- Yoav Shoham
Computer Science
- Stanford University
|
2
|
- Recent trend: study of average/empirical hardness as opposed to the
worst-case complexity (NP-Hardness) [Cheeseman et al.; Selman et al.]
- Our proposal: predict the running time of an algorithm on a particular
instance based on features of that instance
- Today:
- a methodology for doing this
- its application to the combinatorial auction winner determination
problem (WDP)
|
3
|
- Predict running time
- for its own sake
- build algorithm portfolios
- Theoretical understanding of hardness
- tune distributions for hardness
- improve algorithms
- Problem specific
- WDP: design bidding rules
|
4
|
- Decision problems:
- phase transitions in solvability, corresponding to hardness spike [Cheeseman
et al.; Selman et al.]
- solution invariants: e.g., backbone [Gomes et al.]
- Optimization problems:
- experimental:
- reduce to decision problem [Zhang
et al.]
- introduce backbone concepts [Walsh et al.]
- theoretical:
- polynomial/exponential transition in search algorithms [Zhang]
- predict A* nodes expanded for problem distribution [Korf, Reid]
- Learning
- dynamic restart policies [Kautz et al.]
|
5
|
- Auctioneer sells a set of non-homogeneous items
- Bidders often have complex valuations
- complementarities
- e.g. V(TV & VCR) > V(TV) + V(VCR)
- substitutabilities
- V(TV1 & TV2) < V(TV1) + V(TV2)
- Solution: allow bids on bundles of goods
- achieves a higher revenue and social welfare than separate auctions
- Two hard problems:
- Expressing valuations
- Determining optimal allocation
|
6
|
- Equivalent to weighted set packing
- Input: m bids
- Objective: find revenue-maximizing non-conflicting allocation
- Even constant factor approximation is NP-Hard
- Square-root approximation known
- Polynomial in the number of bids
|
7
|
- Difficulty: highly parameterized, complex distributions
- Hard to analyze theoretically
- large variation in edge costs and branching factors throughout the
search tree [Korf, Reid, Zhang]
- Too many parameters to vary systematically [Walsh et al., Gomes et. al.]
- Parameters affect expected optimum: difficult to transform to decision
problem [Zhang et al.]
|
8
|
- Select algorithm
- Select set of input distributions
- Factor out known sources of hardness
- Choose features
- Generate instances
- Compute running time, features
- Learn a model of running time
|
9
|
- Select algorithm: ILOG’s CPLEX 7.1
- Select set of input distributions
- Factor out known sources of hardness
- Choose features
- Generate instances
- Compute running time, features
- Learn a model of running time
|
10
|
- Select algorithm
- Select set of input distributions
- Factor out known sources of hardness
- Choose features
- Generate instances
- Compute running time, features
- Learn a model of running time
|
11
|
- Legacy (7 distributions)
- sample bid sizes/prices independently from simple statistical
distributions
- Combinatorial Auctions Test Suite (CATS)
- Attempted to model bidder valuations to provide more motivated CA
distributions
- regions: real estate
- arbitrary: complementarity described by weighted graph
- matching: FAA take-off & landing auctions
- scheduling: single resource, multiple deadlines for each agent [Wellman]
|
12
|
- Select algorithm
- Select set of input distributions
- Factor out known sources of hardness
- Choose features
- Generate instances
- Compute running time, features
- Learn a model of running time
|
13
|
- Some sources of hardness well-understood
- hold these constant to focus on unknown sources of hardness
- Common: input size
- Problem size is affected by preprocessing techniques! (e.g.
arc-consistency)
- WDP: dominated bids can be removed
- (raw #bids, #goods) is a very misleading measure of size for legacy
distributions
- we fix size as (#non-dominated bids, #goods)
|
14
|
|
15
|
- Select algorithm
- Select set of input distributions
- Factor out known sources of hardness
- Choose features
- Generate instances
- Compute running time, features
- Learn a model of running time
|
16
|
- No automatic way to construct features
- must come from domain knowledge
- We require features to be:
- polynomial-time computable
- distribution-independent
- We identified 35 features
- after using various statistical feature selection techniques, we were
left with 25
|
17
|
- Bid Good Graph (BGG)
- Bid node degree stats
- Good node degree stats
- Price-based features
- std. deviation
- stdev price/#goods
- stdev price/ √#goods
- Bid Graph (BG)
- node degree stats
- edge density
- clustering coef. and deviation
- avg. min. path. length
- ratio of 5 & 6
- node eccentricity stats
- LP Relaxation
- L1, L2, L∞ norms of integer
slack vector
|
18
|
- Select algorithm
- Select set of input distributions
- Factor out known sources of hardness
- Choose features
- Generate instances
- Compute running time, features
- Learn a model of running time
|
19
|
- Sample parameters uniformly from range of acceptable values
- 3 separate datasets:
- 256 goods, 1000 non-dominated bids
- 144 goods, 1000 non-dominated bids
- 64 goods, 2000 non-dominated bids
- 4500 instances/dataset, from 9 distributions
- Collecting data took approximately 3 years of CPU time! (550 MHz Xeons,
Linux 2.12)
- Running times varied from 0.01 sec to 22 hours (CPLEX capped at 130000
nodes)
|
20
|
|
21
|
- Select algorithm
- Select set of input distributions
- Factor out known sources of hardness
- Choose features
- Generate instances
- Compute running time, features
- Learn a model of running time
|
22
|
- Classification: misleading error measure
- Statistical regression: learn a continuous function of features that
predicts log of running time
- Supervised learning: data broken into 80% training set, 20% test set
- Started with simplest technique: linear regression
- find a hyperplane that minimizes root mean squared error (RMSE) on
training data
- Linear regression is useful:
- as a (surprisingly good) baseline
- yields a very interpretable model with understandable variables
|
23
|
|
24
|
|
25
|
|
26
|
- Linear regression doesn’t consider interactions between variables;
likely to underfit data
- Consider 2nd degree polynomials
- Variables = pairwise products of original features
- total of 325 variables
- (cf. clauses/variables)
- More predictability, less interpretability
|
27
|
|
28
|
|
29
|
|
30
|
|
31
|
- Constructing algorithm portfolios
- combine several uncorrelated algorithms
- good initial results for WDP
- Tuning distributions for hardness
- releasing new version of CATS
|
32
|
- Algorithms are predictable
- Empirical hardness can be studied in a disciplined way
- Once again: Structure matters!
- Uniform distributions aren’t the best testbeds
- Constraint graphs are very useful
- Hypothesis: good heuristics make good features (e.g. LP)
- Our methodology is general and can be applied to other problems!
|