1	Learning the Empirical Hardness of Combinatorial Auctions Kevin Leyton-Brown Eugene Nudelman Yoav Shoham Computer Science Stanford University
2	Introduction Recent trend: study of average/empirical hardness as opposed to the worst-case complexity (NP-Hardness) [Cheeseman et al.; Selman et al.] Our proposal: predict the running time of an algorithm on a particular instance based on features of that instance Today: a methodology for doing this its application to the combinatorial auction winner determination problem (WDP)
3	Why? Predict running time for its own sake build algorithm portfolios Theoretical understanding of hardness tune distributions for hardness improve algorithms Problem specific WDP: design bidding rules
4	Related Work Decision problems: phase transitions in solvability, corresponding to hardness spike [Cheeseman et al.; Selman et al.] solution invariants: e.g., backbone [Gomes et al.] Optimization problems: experimental: reduce to decision problem [Zhang et al.] introduce backbone concepts [Walsh et al.] theoretical: polynomial/exponential transition in search algorithms [Zhang] predict A* nodes expanded for problem distribution [Korf, Reid] Learning dynamic restart policies [Kautz et al.]
5	Combinatorial Auctions Auctioneer sells a set of non-homogeneous items Bidders often have complex valuations complementarities e.g. V(TV & VCR) > V(TV) + V(VCR) substitutabilities V(TV₁ & TV₂) < V(TV₁) + V(TV₂) Solution: allow bids on bundles of goods achieves a higher revenue and social welfare than separate auctions Two hard problems: Expressing valuations Determining optimal allocation
6	Winner Determination Problem Equivalent to weighted set packing Input: m bids Objective: find revenue-maximizing non-conflicting allocation Even constant factor approximation is NP-Hard Square-root approximation known Polynomial in the number of bids
7	WDP Case Study Difficulty: highly parameterized, complex distributions Hard to analyze theoretically large variation in edge costs and branching factors throughout the search tree [Korf, Reid, Zhang] Too many parameters to vary systematically [Walsh et al., Gomes et. al.] Parameters affect expected optimum: difficult to transform to decision problem [Zhang et al.]
8	Methodology Select algorithm Select set of input distributions Factor out known sources of hardness Choose features Generate instances Compute running time, features Learn a model of running time
9	Methodology Select algorithm: ILOG’s CPLEX 7.1 Select set of input distributions Factor out known sources of hardness Choose features Generate instances Compute running time, features Learn a model of running time
10	Methodology Select algorithm Select set of input distributions Factor out known sources of hardness Choose features Generate instances Compute running time, features Learn a model of running time
11	WDP Distributions Legacy (7 distributions) sample bid sizes/prices independently from simple statistical distributions Combinatorial Auctions Test Suite (CATS) Attempted to model bidder valuations to provide more motivated CA distributions regions: real estate arbitrary: complementarity described by weighted graph matching: FAA take-off & landing auctions scheduling: single resource, multiple deadlines for each agent [Wellman]
12	Methodology Select algorithm Select set of input distributions Factor out known sources of hardness Choose features Generate instances Compute running time, features Learn a model of running time
13	Problem Size Some sources of hardness well-understood hold these constant to focus on unknown sources of hardness Common: input size Problem size is affected by preprocessing techniques! (e.g. arc-consistency) WDP: dominated bids can be removed (raw #bids, #goods) is a very misleading measure of size for legacy distributions we fix size as (#non-dominated bids, #goods)
14	Raw vs. Non-Dominated Bids (64 goods, target of 2000 non-dominated bids)
15	Methodology Select algorithm Select set of input distributions Factor out known sources of hardness Choose features Generate instances Compute running time, features Learn a model of running time
16	Features No automatic way to construct features must come from domain knowledge We require features to be: polynomial-time computable distribution-independent We identified 35 features after using various statistical feature selection techniques, we were left with 25
17	Features Bid Good Graph (BGG) Bid node degree stats Good node degree stats Price-based features std. deviation stdev price/#goods stdev price/ √#goods Bid Graph (BG) node degree stats edge density clustering coef. and deviation avg. min. path. length ratio of 5 & 6 node eccentricity stats LP Relaxation L₁, L₂, L_∞ norms of integer slack vector
18	Methodology Select algorithm Select set of input distributions Factor out known sources of hardness Choose features Generate instances Compute running time, features Learn a model of running time
19	Experimental Setup Sample parameters uniformly from range of acceptable values 3 separate datasets: 256 goods, 1000 non-dominated bids 144 goods, 1000 non-dominated bids 64 goods, 2000 non-dominated bids 4500 instances/dataset, from 9 distributions Collecting data took approximately 3 years of CPU time! (550 MHz Xeons, Linux 2.12) Running times varied from 0.01 sec to 22 hours (CPLEX capped at 130000 nodes)
20	Gross Hardness (256 goods, 1000 bids)
21	Methodology Select algorithm Select set of input distributions Factor out known sources of hardness Choose features Generate instances Compute running time, features Learn a model of running time
22	Learning Classification: misleading error measure Statistical regression: learn a continuous function of features that predicts log of running time Supervised learning: data broken into 80% training set, 20% test set Started with simplest technique: linear regression find a hyperplane that minimizes root mean squared error (RMSE) on training data Linear regression is useful: as a (surprisingly good) baseline yields a very interpretable model with understandable variables
23	LR: Error
24	LR: Subset Selection
25	LR: Cost of Omission (subset size 7)
26	Non-Linear Approaches Linear regression doesn’t consider interactions between variables; likely to underfit data Consider 2^nd degree polynomials Variables = pairwise products of original features total of 325 variables (cf. clauses/variables) More predictability, less interpretability
27	Quadratic vs Linear Regression
28	Quadratic vs Linear Regression
29	QR: RMSE vs. Subset Size
30	Cost of Omission (subset size 6)
31	What’s Next? Constructing algorithm portfolios combine several uncorrelated algorithms good initial results for WDP Tuning distributions for hardness releasing new version of CATS
32	Summary Algorithms are predictable Empirical hardness can be studied in a disciplined way Once again: Structure matters! Uniform distributions aren’t the best testbeds Constraint graphs are very useful Hypothesis: good heuristics make good features (e.g. LP) Our methodology is general and can be applied to other problems!