Interesting, But then you can run another neural network on say 16x16 pixel data?
I have the feeling you can throw endless resources at the problem, although that may take a second place to algorithms and knowing what you're doing.
Of course, it is possible to run another ANN with a other input size. A CNN usually consists of mutliple different kinds of layers. There so-called Convolutional Layers (CL), Pooling Layers (PL) and Fully-Connected Layers (FL) at the end of the network. I've made a post regarding the structure of CNNs a while ago, so don't bother search for it.
CLs are used to recognize features within the input data. For pixel data, features are composed of adjascent pixels, forming regions of certain contrasts, figures, edges, and so on... You usually have multiple CLs in parallel, as every CL is capable to search the input data for one single feature. Again this is achieved via raster scan fashion, meaning every neuron in a CL is looking at certain space of the input data. For example, if a CL has a kernel size of 5x5, it searches for features with a size of 5x5 pixels. For every feature you want to recognize, you have a dedicated CL in parallel, which are independent from each other. Therefore you achieve parallelism, where you can get a good speed-up by using a GPGPU or other VectorProcessingUnits. As well, the general execution is very similar to matrix multiplication, and therefore it is possible to use tricks like screen-space tiling (BLAS) to achieve better performance.
After every CL, there is a Poolin Layer, which downsamples the recognized features. This is very important, as it removes noise and feature "artifacts". It can be seen as a downsampling kernel (Min, Max, Average, etc.). After that there are another CLs and PLs. You stack them together, in order to recognize other high-level or low-level features, as features can be composed of other features, and so on. The general structure of CNNs is established after the visual cortex in our brain, where complex nervous cells and simple nervous cells are stacked onto each other. After multiple CLs and PLs, there are FLs, which gather the recognized feature information and map them to corresponding outputs.
For example, if I want to recognize different traffic signs, I would use a defined input pixel size, for my input image. For example 32x32 pixels RGB => 3x32x32 input data, and use multiple stacked parallel CLs and PLs. Maybe there are 10 parallel CL and PL pipes, which 3 stacks each, consisting of 6 layers at all. Then I include two additional FL layers at the end of my network consisting of 100 and then 30 neurons. If I want to recognize 10 different traffic signs, the CNNs has 10 outputs. So I train that network with my GD algorithm and until my evaluation data set achieve a very high recognition rate. For every output I get a probability of the corresponding feature. So there can be problems, with negative-positive and positive-negative errors. For example, for the recognition of handwritten digits (MNIST), a "7" can be look similar to a "1", therefore the recognition rate for those cases might suffer.
And now for your question. If I create a CNN with 32x32 pixels input size, it also includes smaller input sizes. As I mentioned, a CNN consists of stacked CLs and PLs, with downsampling capabilities. So you indirectly are investigating the input for smaller features. The final answer is always, "it depends"! It is possible to train a 32x32 CNN to recognize 16x16 input data, but you need to make sure, that the data is transformed accordingly (scaling, zero-patterning) and that the network is trained to recognized such transformed data. Maybe you need greater CLs or more stacked CLs and PLs to recognize those data, then. Maybe it would be better to run an independent CNN. So the final answer is "it depends".
Creating an and training an CNN is pretty simple, with toolkits like Theano or Caffe. The problem is, how can I preprocess the input data, how can I postprocess the output data, how many CLs and PLs are necessary, what is right feature size, and so on... These are the true hindrances. CNNs and ANNs are black boxes once trained, you cannot take a look inside and understand whats going on.