Preface: this whole thing is (on purpose ) written before doing any kind of research into feasibility and whether parts already exist, therefore you should take everything with a grain of salt. While most of it will probably turn out differently, I expect it to be helpful to set a direction for something that could be really interesting and give a few pointers for the way there. So, let's go:
I expect programming to become more declarative and a lot less exact. To elaborate on that, we first need to have a look at how programming works today: you usually have some input or a system state and want to get to a result or another system state. You do this by defining a series of exact steps that gets from the input to the result. Of course, this isn't done directly in system instructions anymore - there are multiple levels of abstraction in between, but all of these abstractions are exactly specified relative to the layer below without any ambiguity. This means we still exactly specify a series of small steps the computer then follows. To summarize: there is the underlying assumption that the only thing a computer should do, is exactly follow a series of steps and if there are none it should do nothing.
我期待编程变得更加声明式(译注)并且不要那么精确，详细一点来说，我们得看看现在的编程是怎样进行的：通常你有一些输入或者一个系统状态，想要得到一个结果或是其他的系统状态，你通过定义一系列精确的步骤来实现从输入到结果的过程。当然，这不再是通过系统指令直接完成了 —— 而是在输入和结果之间有许多抽象的层，所有这些抽象层都十分精确的和下层互相关联，当中没有任何的模棱两可。也就是说，我们实际上精确的指定了一系列很小的步骤，然后计算机去执行。总结：这里有个基本假设，计算机会做的唯一事情就是精确的执行一系列预定步骤，如果没有，那它什么也不做。
There have been attempts to create a way of programming that works without giving exact steps by using a more declarative method involving logic statements and constraints and running deduction or a solver over it. The problem here is that if the program is underspecified deduction may not go anywhere and the solver will give us a lot more in addition to the "correct" / wanted result. If we take a pick it most likely is the wrong one, which brings us to the next underlying assumption: if the program leaves a choice / a degree of freedom it is expected that the computer will not do the right thing.
Of course, these assumptions are perfectly valid and reasonable, at least for the computers we have today, the way we program, and the problems we use them for - and a lot are going to stay. But let's have a look beyond that and turn these assumptions around: what if programming wouldn't involve defining exact steps, but instead just roughly defining what we have and what we want and maybe giving a few hints, and having the computer generally do the right thing - wouldn't that be awesome?
Sounds impossible, right? Not necessarily. Let's first have a look at a few things that have been happening in recent years and after that some ideas that could be worth exploring to get there.
曾经有一些尝试试图创造一种编程方法，可以不用给出精确的步骤，通过使用一个更加声明式的方法来引入逻辑语句和约束，并且能在其上进行演绎和求解。而困难在于，如果程序演绎不足，则可能哪里也去不成，求解器将给返回我们许多结果，除了正确的 / 想要的那个。如果我们选取一个最有可能是错误的结果，这里会带给我们下一个基本假设：假如程序留有一定的自由选择度，那么计算机很可能无法做出正确的选择。
当然，这些假设是完全合理有效的，至少对当今的计算机而言，我们编程的方式，以及我们用程序解决的那些问题，大部分都会继续保持。但是让我们超越这个问题，撇开这些假设来看：要是编程不涉及定义精确的步骤呢，而只是粗略定义一下我们有什么，我们要什么，也许再给一些提示，就能让电脑大致做出正确的事 —— 这不是很棒吗？
Observations and expectations
The first computers were used for a variety of tasks: as advanced calculators, for managing a company's finances and even for controlling rockets and calculating a route to the moon. All these application areas have one thing in common: there already was one known correct solution to the problem - it just had to be automated because performing the task by hand was too tedious and not accurate and fast enough. In addition, the way these computers were used had one very interesting characteristic: the result wanted from these computers was defined as being the result of that exact calculation. To understand what I'm trying to say with that sentence, we need to jump back to today:
Today, a lot of these calculation tasks still exist, but most use cases have moved up one level: the focus is no longer on getting the result of following a specific set of instructions - today we just have a goal without caring how to get there. It goes even further than that: for a lot of goals like entertaining people, selling as much stuff as possible, or even smaller scale ones like having a usable interface we don't even know the right way to get there. Yet we still try to squeeze this in our exact step by step "we know what to go for and how to get there"-programming model.
第一代计算机曾被用于各种各样的任务：比如高级计算器，管理公司财务，甚至控制火箭，计算到月球的路径等。这些应用领域都有一个共同特点：问题都有一个已知正确的解决方案 - 不得不进行自动化的原因仅仅是因为，人工执行任务的时候太过单调乏味，而且也不够准确和快速。另外，这些计算机的使用方式有一个很有趣的特点：从计算机中期望得到的结果被定义为精确计算的结果。为了理解我说的这句话，我们需要跳回到今天：
今天，一大批这种计算任务仍旧存在，可是大部分场景已经升格为：焦点已经不再是根据一系列指令来获取结果 - 如今我们只有一个目标，而不知如何到达。目标实际上会更远：对许多目标来说，比如怎样取悦人们，怎样卖出更多货，或者小一些的目标，比如怎样拥有一个可用接口，这些目标我们甚至都不知道如何实现，但是我们仍然尝试用我们一个个精确的步骤来压榨出“我们知道该去哪儿，以及怎么去那里” - 编程模型。
When looking at chip development, we also can see some big changes coming up there: transistors are getting so small that we are close to the physical limit of still being able to guarantee that they function correctly all the time. This is extremely at odds with a programming model that relies on exact results and every step being executed correctly to get to the correct result. What if we instead could use and build hardware on which calculations only had to be ballpark correct only most of the times? This could not only solve the scaling issue, it could also enable looking at completely new ways of computing, for example by embedding biological elements.
In addition to changes in requirements and challenges in chip development, systems have to process more and more data that is noisier than ever.
The thing we are looking for has to be able to handle any kind of noise - in the input data and during processing in the form of only ballpark accurate results and occasional errors very well. In addition, it should have a way to figure out how to do "the right thing" without being given the exact steps. One area that is quite suited for that are recent advancements in machine learning, especially neural networks. What if instead of just calling them in some parts of our code, we build a whole computing and programming model on top of them.
在芯片开发方面，我们也可以看到一些很大的变化：晶体管越来越小，接近于物理极限，仍然能够保证它们始终正常工作。这与一个依赖于精确结果的编程模型是非常不一致的，每一步都被正确执行以获得正确的结果。如果我们能够使用和构建硬件， 而计算只能在大多数情况下都是正确的，那该怎么办呢? 这不仅可以解决缩放问题，还可以通过嵌入生物元素来查看全新的计算方法。
我们正在寻找的东西必须能够处理任何类型的噪音 - 在输入数据和处理过程中，只有准确的结果和偶尔的错误才能更好地处理。另外，还应该有一个方法来弄清楚如何做“正确的事情”，而不是给出确切的步骤。一个非常适合的领域是最近在机器学习方面的进步，特别是神经网络。如果我们不是在代码的某些部分调用它们，而是在它们之上构建一个完整的计算和编程模型。
What to look at to get there
A little disclaimer: I don't expect neural networks to be the only solution for this kind of problem, they may not even be the right one, but at the moment they are the most promising one. To be able to understand the following parts you should have some knowledge about neural networks.
I believe there are three major areas to work on: the first is programming interfaces, meaning what our code will look like in the new programming model. The second, neural network interfaces, is about internal data representation in neural networks and how to connect them to the outside world / make use of existing technology, and automatic connections between neural networks. And finally I will describe a few things that should be changed in neural networks to enable everything else.
Imagining what code should look like is hard without having something to build on - especially because it greatly depends on the capabilities of the underlying computing model. And yet it is very important, because it gives the rest a target to go for, and defines what would be necessary to get there.
My first intuition would be to go for a combination of a declarative approach (to be able to define what we have, what we want and a few constraints the net has to work around) and component based thinking (with the assumption that there are existing neural network parts that have been trained to do or recognize certain things). Inspiration for the declarative part could come from taking a peek at the syntax of existing declarative languages like Prolog in addition to maybe some ideas from different probabilistic programming approaches.
This will be the part that most likely is wrong (and it definitely goes beyond the capabilities we currently have in the area of neural networks), but I will try to give a first attempt how such code could look - in this case to make some money with a photo site:
这些将是最可能出错的部分（它绝对超出了我们目前在神经网络领域的能力），但是我会试着第一次尝试这样的代码看看会是什么样 - 在这种情况下，用一个有图片展示的网站赚钱：
This code still contains a fair share of plumbing as it makes use of a lot of existing infrastructure.
Finally, there is a bit of glue code around to wire everything up -
I have to admit this is quite complicated. Because of this, I prepared a second example that is a bit closer to today's capabilities and therefore easier to understand. In this case we want to train a neural network for binary image segmentation. This means we have an input image and a mask that colors all relevant pixels, for example, all pixels that are part of a car.
As you might notice, we do not specify the actual layout of the neural network - this should be happening automatically during training. We still would be able to do it in this case, but in the other example before, it would be very hard to come up with anything useful by hand. Automating the layout of a neural network will be one of the more challenging tasks to get there - a few ideas to get started with will be in the following sections.
Neural Network Interfaces
There are two major topics to work on in the area of neural network interfaces: training neural networks to interface with existing tools, and automatic training of interfaces between neural networks.
正如你注意到的，我们并没有指定神经网络的真正布局 - 这应该是在训练过程中自动发生的。在该示例中我们依然能够这样做，但是在前面的示例中，手动发现有用的事情非常困难。自动化神经网络布局将会是到达那里的更有挑战性的任务之一 - 开始的一些想法将会在下面的内容中讨论。
The interfaces to existing tools or data can be very easy, as in the case of images, and quite complicated, like the imaginary HTML example before. In general, there are two viewpoints from which this task can be started.
The first and currently more popular one is the outside view that is looking at how to feed data into neural networks usually using a very simple encoding like pixel values or character series and how to get the result out again. There already has been significant research for different tasks including images, having neural networks generate and parse text and even have them read code and compute the result.
The second, and in my opinion, far more interesting approach would be to take the internal viewpoint and start thinking about how neural networks could be trained to take advantage of existing tools and access data themselves instead of having it fed. For example, humans do not usually read a text by having it fed character by character to them, instead we try to get larger parts of multiple characters and jump back and forward - maybe this could also be helpful for neural networks. Another example would be writing code - nobody can just write down a series of characters and be done with it. We also jump around there, have auto-complete and even a feedback loop by just running the part we already have - giving a neural network access to these features could also be very helpful.
与面有工具或数据的交互可以非常简单，例如该示例中的图像，也可以非常复杂，例如前面想像的 HTML 示例。通常而言，对于任务如何开始有两个观点。
第二个也是我的观点，更为有趣的方法是采用内部观点并开始思考如何训练神经网络来利用已有的工具并访问数据自身，而不仅是传递数据。例如，通常人类并不是通过向其传递一个一个字符的方式来读取文本的，相反我们会尝试获取大块多个字符，并前后跳跃 - 也许这对于神经网络也会有用。另一个示例是写代码 - 没有人是仅写下字符序列就完成的。 我们也会跳跃，拥有自动完成，甚至是仅运行我们已有部分的反馈回路 - 为神经网络赋予这些功能也将会非常有用。
Automatic training of interfaces between neural networks will be one of the core challenges to get everything to work. One very naive approach could be to just connect everything to everything and then gradually drop connections that only carry an insignificant weight. The downside of this is that it could be very slow, and in addition it does not create an advanced multilayer structure that could be necessary for good results most of the time. Another approach could be starting with nothing and dynamically adding neurons and connections if necessary - the difficulty here is to find out where and when.
Neural Networks Themselves
There are a number of things that could be interesting to work on based on current neural network ideas, even if not working towards this new way of programming. I believe one of the biggest problems is the use of Error Propagation and Gradient Descent. This will more and more be an issue as networks get deeper and more irregular like in this use case. I expect the path forward to be in finding a way to create some kind of locally emergent behavior.
In the meantime, it could be useful to get an idea what the error space of neural networks looks like. To do this, one could collect many different parameter weight configurations and the corresponding error value and find a way to get information out of that. This could either be done by developing a way of mapping the high dimensional feature space down to three dimensions to be able to inspect them visually or at least finding metrics that also work in higher dimensions. One idea (that will most likely not work, but gives an example, what to go for) is building a 3D map by doing something like a PCA on all weight parameters down to 2 values and using the loss or error as a third, the height value. With this one could have a look at the resulting height profile and maybe get find something that the optimal results have in common. The problem with using the PCA probably will be that the optimal and all other values will be all over the place leading to a huge mess. But maybe there is another transformation that would work in this case.
接口之间的自动训练将是使得一切正常工作的核心挑战之一。一个非常简单的方法是仅将一切相连，然后逐步丢弃具有不重要权重的连接。这种方法的缺点是其速度非常慢，另外它并没有创建对于大多数时候好结果所必须的高级多层结构。另一种方法是由空开始，并在需要时动态添加神经元与连接 - 这里的困难是找出在哪里以及何时添加。
同时，弄明白神经网络错误空间是什么样子会非常有用。为此，我们可以收集不同的参数权重配置以及相应的错误值，并寻找获取错误空间信息的方法。 这可以通过开发一种方法将高维特征空间映射到能够可视化查看的三维空间来实现， 或者至少可以通过查找同样可作用于高维空间的指标来实现。一种想法 (很有可能不工作，但是却给出一个应走向哪里的例子) 是通过在所有的权重参数上执行类似 PCA 的操作将其降维为2维，而使用误差或者错误作为第三维，高度维，来构建一个 3D 映射。通过这种方法可以了解所得到的高度配置，而且有可能找到最优结果所共同拥有的内容。使用 PCA 的问题可能在于最优值与所有其他值遍布于整个空间而导致高度混乱。但是也有可能存在适用于该示例的其他变换。
As metrics on higher dimensional parameters, the shape of the valleys applied to local and global minima would be interesting. My wild guess would be that optima that come from overfitting would be in very narrow and steep valleys, while the optima from real generalization would be in wider valleys. Of course, this is probably completely wrong, but it gives you an idea why these kinds of metrics could be useful for building better optimization algorithms.
Another very important goal would be to find a way of training neural networks without defining the layout in advance, but instead have the training process find a suitable one. As a starting point there should be a metric for the usefulness of a single neuron and for a single connection. For the connection the weight, meaning its distance from zero could work, because a low value would mean the information going over this connection has little influence and therefore probably is not very relevant. The usefulness of a neuron could then be defined as the combined usefulness of all its connections, maybe minus or divided by a score for similarity to other neurons in the layer, because of two neurons of identical connections we probably just need one.
The training then could initially start with just one neuron between the input and the output and dynamically adding and removing them until getting close to some threshold of usefulness. A new layer - again with one neuron - could then be created using another metric. Maybe something like if one layer is a lot bigger than both of its neighbors. In the long-term I expect it could be better to move away from the layer structure. Especially when moving towards dedicated hardware instead of float-crunching on GPUs. In addition, some of the more complex layouts may not even be possible with a clean layer structure.
This text is by no means complete - the reason for writing it is to give a few ideas where to start and what to work on, to me and maybe even to you. If you are working on something similar or parts of it or have a few ideas, suggestions or questions feel free to contact me at firstname.lastname@example.org. (I even prepared a nice mailto: link that already pre-fills half of the email so you only have to write the content. You should go for it - just click on the mail address.)
 I believe a lot of research today is limited by first looking at, and becoming an expert in, the status quo and then building small iterative improvements. It would be better to first find a goal and then looking for a way of getting there - at least that's the way we got to the moon.
那么训练初始时可以仅由输入与输出之间的一个神经元开始，并动态添加或删除神经元，直到接近某个有用的阀值。然后一个新层 - 仅有一个神经元 - 可以使用其他的指标创建。也许类似于一层是否大于其两个邻居。在长期内，我希望最好能够脱离层结构。特别是面向特定硬件而不是 GPU 上的浮点处理时。另外，某些更为复杂的结构甚至可能并没有一个清晰的层结构。
毫无疑问，这篇文章并不完整 - 写下该文章的原因是，对我甚至可能对你，提供一些由哪里开始以及做什么的思路。如果你正在做类似的事情，或者其中的一部分，或者有一些想法，建议或问题，请联系我 email@example.com. (我甚至准备了一个漂亮的 mailto: 链接，预先填写了一半的邮件，从而你仅需要书写内容。你应该去做 - 点击邮件地址吧。)
 我相信今天的大量研究受限于首先了解, 然后变为专家，维持现状并构建小的迭代改进。最好首先确定一个目标，然后寻找到达那里的路- 至少我们就是这样登上月球的。