Papperoni

2017

Deep & Cross Network for Ad Click Predictions

Mingliang Wang

Open PDF Google Scholar

citations

Cite Score

48

AI summary

This paper introduces the Deep & Cross Network (DCN) model for web-scale automatic feature learning, which efficiently captures feature interactions of bounded degrees and learns highly nonlinear interactions, achieving state-of-the-art performance on the Criteo CTR dataset.

Main Contributions

Introduces a novel cross network that explicitly applies feature crossing at each layer, efficiently learns predictive cross features of bounded degrees, and requires no manual feature engineering or exhaustive searching.
The cross network is simple yet effective. By design, the highest polynomial degree increases at each layer and is determined by layer depth. The network consists of all the cross terms of degree up to the highest, with their coefficients all different.
The cross network is memory efficient, and easy to implement.
Experimental results demonstrate that with a cross network, DCN has lower logloss than a DNN with nearly an order of magnitude fewer number of parameters.
DCN outperforms state-of-the-art algorithms on both sparse and dense datasets, in terms of both model accuracy and memory usage.

Abstract

Feature engineering has been the key to the success of many prediction models. However, the process is nontrivial and often requires manual feature engineering or exhaustive searching. DNNs are able to automatically learn feature interactions; however, they generate all the interactions implicitly, and are not necessarily efficient in learning all types of cross features. In this paper, we propose the Deep & Cross Network (DCN) which keeps the benefits of a DNN model, and beyond that, it introduces a novel cross network that is more efficient in learning certain bounded-degree feature interactions. In particular, DCN explicitly applies feature crossing at each layer, requires no manual feature engineering, and adds negligible extra complexity to the DNN model. Our experimental results have demonstrated its superiority over the state-of-art algorithms on the CTR prediction dataset and dense classification dataset, in terms of both model accuracy and memory usage.

Citation Graph

Loading graph...

References [18]

Sort:

Filter:

[1]Deep Residual Learning for Image Recognition

K. He, X. Zhang, S. Ren, Jian Sun - 2016

20 papers in library cite

This is simply amazing. Very very simple idea, totally revolutionary. No maths, just "it works!". Amazing.

[2]Adam: A Method for Stochastic Optimization

D. P. Kingma, Jimmy Lei Ba - 2014

49 papers in library cite

Amazing paper! Very well explained and huge impact. I am amazed that they made something so simple even when it requires a lot of background mathematical knowledge

[3]Deep Learning

I. Goodfellow, Yoshua Bengio, Y. A. Courville, A. Aaron - 2016

5 papers in library cite

It's an awesome review for people that are not familiar with deep learning, but it's very basic as well. Not too relevant given the context of other papers I read, but huge impact in terms of scientific communication.

[4]Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

S. Ioffe, Christian Szegedy - 2015

18 papers in library cite

Very good paper! Similar feel as ResNets: simple idea, elegant. Not too mathy

[5]Deep Learning in Neural Networks: An Overview

Jürgen Schmidhuber - 2015

2 papers in library cite

34 pages, but seems like a good overview

[6]Residual Networks Behave Like Ensembles of Relatively Shallow Networks

A. Veit, M. J. Wilber, S. Belongie - 2016

4 papers in library cite

Resnets are ensembles

[7]Principles of Mathematical Analysis

W. Rudin, Others - 1964

2 papers in library cite

[8]Deep Crossing: Web-Scale Modeling Without Manually Crafted Combinatorial Features

Y. Shan, T. R. Hoens, J. Jiao, Haiming Wang, D. Yu, J. Mao - 2016

1 paper in library cites

[9]Factorization Machines

S. Rendle - 2010

1 paper in library cites

[10]Factorization Machines With libFM

S. Rendle - 2012

1 paper in library cites

[11]Field Aware Factorization Machines for CTR Prediction

Y. Juan, Y. Zhuang, W. S. Chin, C. J. Lin - 2016

1 paper in library cites

[12]Field-Aware Factorization Machines in a Real-World Online Advertising System

Y. Juan, D. Lefortier, O. Chapelle - 2017

1 paper in library cites

[13]Higher-Order Factorization Machines

M. Blondel, A. Fujino, N. Ueda, M. Ishihata - 2016

1 paper in library cites

[14]Learning Polynomials With Neural Networks

G. Valiant - 2014

1 paper in library cites

[15]Sibyl: A System for Large Scale Supervised Machine Learning

K. Canini - 2012

1 paper in library cites

[16]Simple and Scalable Response Prediction for Display Advertising

O. Chapelle, E. Manavoglu, R. Rosales - 2015

1 paper in library cites

[17]Tensor Machines for Learning Target-Specific Polynomial Features

Jihan Yang, A. Giens - 2015

1 paper in library cites

[18]Wide & Deep Learning for Recommender Systems

H. T. Cheng, L. Koc, J. Harmsen, T. Shaked, T. Chandra, H. Aradhye, G. Anderson, G. Corrado, Wenhao Chai, M. Ispir, Others - 2016

1 paper in library cites

Cited by

0

papers in your library

Cites

6

papers in your library

Read

on November 23, 2025

Very nice idea but I just feel that it's so unexplored! So many opportunities for improvement (e.g. attention!) maybe in v2?

Tags

Paper Aliases

No aliases