-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy path9datapreprocessing.html
More file actions
138 lines (113 loc) · 7.84 KB
/
9datapreprocessing.html
File metadata and controls
138 lines (113 loc) · 7.84 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta name="description" content="Deep Learning Tutorial using Keras">
<meta name="author" content="Lindsey M Kitchell">
<title>Intro to Deep Learning</title>
<!-- Bootstrap core CSS -->
<link href="vendor/bootstrap/css/bootstrap.min.css" rel="stylesheet">
<!-- Custom styles for this template -->
<link href="css/simple-sidebar.css" rel="stylesheet">
<!-- fonts -->
<link href="https://fonts.googleapis.com/css?family=Source+Sans+Pro:300,400,600,700&display=swap" rel="stylesheet">
</head>
<body>
<div class="d-flex" id="wrapper">
<!-- Sidebar -->
<div class="bg-light border-right" id="sidebar-wrapper">
<div class="sidebar-heading">Deep Learning With Keras</div>
<div class="list-group list-group-flush">
<a href="1introtodeeplearning.html" class="list-group-item list-group-item-action bg-light">1. Intro to Deep Learning</a>
<a href="2introtokeras.html" class="list-group-item list-group-item-action bg-light">2. Intro to Keras</a>
<a href="3mlpsinkeras.html" class="list-group-item list-group-item-action bg-light">3. MLPs in Keras</a>
<a href="4cnnsinkeras.html" class="list-group-item list-group-item-action bg-light">4. CNNs in Keras</a>
<a href="5activationfunctions.html" class="list-group-item list-group-item-action bg-light">5. Activation Functions</a>
<a href="6otherkerasfunctions.html" class="list-group-item list-group-item-action bg-light">6. Other Useful Keras Functions</a>
<a href="7lossfunctionsoptimizers.html" class="list-group-item list-group-item-action bg-light">7. Loss Functions and Optimizers</a>
<a href="8evaluatingnns.html" class="list-group-item list-group-item-action bg-light">8. Evaluating Neural Networks</a>
<a href="9datapreprocessing.html" class="list-group-item list-group-item-action bg-light">9. Data Preprocessing</a>
<a href="10regularization.html" class="list-group-item list-group-item-action bg-light">10. Regularization</a>
<a href="11hyperparametertuning.html" class="list-group-item list-group-item-action bg-light">11. Hyperparameter Tuning</a>
</div>
</div>
<!-- /#sidebar-wrapper -->
<!-- Page Content -->
<div id="page-content-wrapper">
<nav class="navbar navbar-expand-lg navbar-light bg-light border-bottom">
<button class="btn btn-primary" id="menu-toggle">Toggle Menu</button>
<button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbarSupportedContent" aria-controls="navbarSupportedContent" aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
<div class="collapse navbar-collapse" id="navbarSupportedContent">
<ul class="navbar-nav ml-auto mt-2 mt-lg-0">
<li class="nav-item active">
<a class="nav-link" href="index.html">Home <span class="sr-only">(current)</span></a>
</li>
<li class="nav-item">
<a class="nav-link" target="_blank" href="https://lindseykitchell.weebly.com/">About the Author</a>
</li>
<!--
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#" id="navbarDropdown" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
Dropdown
</a>
<div class="dropdown-menu dropdown-menu-right" aria-labelledby="navbarDropdown">
<a class="dropdown-item" href="#">Action</a>
<a class="dropdown-item" href="#">Another action</a>
<div class="dropdown-divider"></div>
<a class="dropdown-item" href="#">Something else here</a>
</div>
</li>
-->
</ul>
</div>
</nav>
<div class="container-fluid">
<h1 id="data-preprocessing">Data Preprocessing</h1>
<hr>
<p>Before feeding your data into a neural network there is some data preparation that must be done. The exact preprocessing that must be done will be different for each domain but we can cover the basics here. </p>
<h2 id="vectorization">Vectorization</h2>
<p>All inputs and targets (data and labels) fed into a neural network must be tensors of floating point data. The step of turning the data into a tensor is called <em>vectorization</em>. A tensor is a multidimensional array, a generalization of matrices to an arbitrary number of dimensions. For example, a matrix is a 2D tensor, a vector is a 1D tensor, a scalar is a 0D tensor etc. In python, these are represented using numpy arrays. Your input data should be in a numpy array with the data format 'float32'. Data labels can be vectorized using one-hot encoding, as we have seen already. </p>
<p>Example of data tensors:</p>
<h3 id="vector-data">Vector data</h3>
<p>Vector data should be combined into a 2D array with the size (samples, features). One row per sample, one column per feature.</p>
<h3 id="time-series-or-sequence-data">Time series or Sequence data</h3>
<p>Time series or sequence data should be in a 3D array. Each sample can be encoded as a 2D matrix (one row per feature, one column per timestep) and the samples are combined into a 3D array of size (samples, features, timesteps). </p>
<h3 id="image-data">Image data</h3>
<p>Image data should be in a 4D array of size (samples, image height, image width, number of channels). The number of channels referes to the color channels of the image. If it is a greyscale image, then the number of channels is 1. If it is a color image, then the number of channels is 3 (RGB or HSV). </p>
<h2 id="normalization">Normalization</h2>
<p>The input data should be normalized so that each feature has a mean of 0 and a standard deviation of 1. It is not good to feed in large values or heterogeneous data (e.g. data where one feature is in the 0-1 range and another is in the 200-300 range), as it will cause issues with the gradient descent. Instead, the data should have small values (most in the 0-1 range) and be homogeneous (all features have data in roughly the same range). </p>
<p>The normalization step is done by normalizing each feature independently to have a mean of 0 and noramlizing each feature independently to have a standard deviation of 1. This can be done manually or by using a function in sklearn.</p>
<h3 id="manually">Manually</h3>
<pre><code class="lang-python"><span class="hljs-attr">x</span> = data <span class="hljs-comment">#numpy array</span>
<span class="hljs-comment"># subtract the mean across the 0-axis (columns)</span>
x <span class="hljs-attr">-=</span> x.mean(<span class="hljs-attr">axis=0)</span>
<span class="hljs-comment"># divide by the standard deviation across the 0-axis (columns(</span>
x /= x.std(<span class="hljs-attr">axis=0)</span>
</code></pre>
<h3 id="sklearn">sklearn</h3>
<p>sklearn has a function called StandardScaler that will do this for you. <a href="http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html">sklearn documentation</a>.</p>
<pre><code class="lang-python"><span class="hljs-attr">X</span> = data
<span class="hljs-attr">scaler</span> = StandardScaler()
<span class="hljs-attr">X_normalized</span> = scaler.fit_transform(X)
</code></pre>
<p><strong>Please continue on to <a href="10regularization.html">Regularization</a>.</strong></p>
</div>
</div>
<!-- /#page-content-wrapper -->
</div>
<!-- /#wrapper -->
<!-- Bootstrap core JavaScript -->
<script src="vendor/jquery/jquery.min.js"></script>
<script src="vendor/bootstrap/js/bootstrap.bundle.min.js"></script>
<!-- Menu Toggle Script -->
<script>
$("#menu-toggle").click(function(e) {
e.preventDefault();
$("#wrapper").toggleClass("toggled");
});
</script>
</body>
</html>