首页 > 代码库 > 当你编码时你在做什么:谈编程的本质(一)状态机
当你编码时你在做什么:谈编程的本质(一)状态机
当你编码时你在做什么:谈编程的本质(一)状态机
这学期学习了两门有意思的课,Theory of Computation和Distributed System,一低一高完全两个层次上的分支,却意外地发现两者在理论方面的重叠——那就是状态机。在Theory of Computation中,DFA、NFA、Turing Machine都是非常经典的State Machine,而在Distributed System中,Global State的确定、一致性协议的Replicated State Machine等处,又再次看到了它的身影。
于是搜到了图灵奖大牛Lamport的一篇论文《Computation and State Machines》,讲述了无所不能的State Machine。下面对这篇note中触及编程本质的几处做下摘录,句句经典:
1. Think Better, Code Better
“I believe that the best way to get better programs is to teach programmers how to think better. Thinking is not the ability to manipulate language; it’s the ability to manipulate concepts. Computer science should be about concepts, not languages. But how does one teach concepts without getting distracted by the language in which those concepts are expressed? My answer is to use the same language as every other branch of science and engineering—namely, mathematics.”
“The obsession with language is a strong obstacle to any attempt at unifying different parts of computer science. When one thinks only in terms of language, linguistic differences obscure fundamental similarities. Simple ideas can become complicated when they must be expressed in a particular language.”
Lamport认为想要让程序员写出更好的代码,不是天天教他们如何使用语言,而是如何更好地思考,而思考也不是使用编程语言的能力,而是操纵概念的能力。计算机科学应该只关乎概念而非就具体的语言。但怎样才能关注概念而不偏向于用来表达概念的语言呢?Lamport给出的答案是:使用科学和工程学科公共的语言——数学!
具体语言的差异性阻碍了我们思考本质的相似性,也阻碍了计算机科学各分支学科的统一,简单的想法变得复杂。这一观点笔者十分认同!以前个人非常喜欢具体的算法实,但这学期学习了Distributed System后,开始学会欣赏抽象中的美,例如学编程最常见的就是伪代码了。以下是分布式选举Bully算法的一小部分伪代码:
CLRS上还有很多非常漂亮的伪代码可以参考。
Pseudocode: Show your Idea
关于学习如何写或欣赏伪代码,个人有个亲身实验过的方法:找一段不太复杂的实现代码,比如Java,对照着教材上的例子,尝试将代码翻译成漂亮的伪代码。这里简单说一下伪代码与实现代码的区别,以及个人觉得伪代码美在何处:
- 省略“细枝末节”:伪代码省略了Assert或参数前置条件检查、Log打印、Error的捕获和处理、以及像List/Set/Array格式和转换之类的实现细节等。用细枝末节这个词其实不准确,因为这些都是在工程实现中至关重要的方面,所以权且加个引号吧。
- 简洁优雅统一:以前没有发现,伪代码中一些东西非常接近数学,因此继承了数学的简洁优雅而且统一的特点。比如集合元素添加删除、交集并集、空集等。此外,伪代码对全局数据的列举和描述也非常清晰,值得学习!
2. Everyday Math
“Much of computer science is about state machines. This is as obvious a remark as saying that much of physics is about equations.”
“State machines provide a framework for much of computer science. They can be described and manipulated with ordinary, everyday mathematics — that is, with sets, functions, and simple logic. State machines therefore provide a uniform way to describe computation with simple mathematics.”
计算机科学几乎大部分都是关于State Machine的,这就像我们说物理学很大部分都是数学等式一样显然。但为什么状态机可以作为这么庞大的计算机科学的基础呢?那是因为State Machine可以用最普通最简单的数学来描述——集合、函数和逻辑。其中用集合表示Alphabet(既Input的Domain),函数表示Transition表,逻辑是其正确性的基础,例如下面是经典的图灵机(状态机的一种)定义:
3. State Machine = Computing Object
“Computing objects—objects that compute. A computation is a
sequence of steps, which I call a behavior. When a computation is a state behavior (
Ask youself: What is Computation? What are you doing?
关于究竟什么是Computation? 其实《Introduction to Theory of Computation》里说的更为清晰:”A Turing machine computes a function by starting with the input to the function on the tape and halting with the output of the function on the tape. A function is computable function if some Turing
machine M , on every input w, halts with just f(w) on its tape.”如果一个图灵机总是接受输入w,产生输出f(w),我们就说这台图灵机是计算f的,运行的过程(一系列步骤、动作行为或者状态转移+动作)就是计算。而f(w)就是它每次根据不同输入w产生的计算结果。所以可以看出,其实换汤不换药,本质上这个关于计算更正式的定义与Lamport说的是一回事。
Examples: State/State-Action Machine
Lamport列举了一些State Machine的经典例子:
- Automata:这学期的Theory of Computation的前半部全在学Automata,从DFA、NFA、Regular Expression、CFL,到最著名的Turing Machine。有的只有内部状态,有的还有额外Action外加一个Stack。其中,图灵机的状态最为复杂,有内部状态、磁头位置、磁带内容等。
- von Neumann Computer:冯诺依曼更接近我们现在的计算机,它的状态也是图灵机的延伸,有所有寄存器和内存的内容、指令寄存器PC的指向位置等。
- Algorithm:算法的通常定义是:像菜谱一样的,一系列步骤来产生一个行为。现在看来,这个定义是不是很像Computating Object(State Machine)的定义?最让人意外的是:分布式算法也能用State Machine来描述。
“It would be an absurd trivialization to say that Turing machines and distributed algorithms are the same because they are state machines, just as it would be absurd to say that relativity and quantum mechanics are the same because they use equations.”
Lamport在此也提醒读者,虽然很多计算机科学的问题都能用状态机表示,但这不代表它们有多少雷同之处甚至完全相同。这就好比,因为相对论和量子物理都用数学等式,我们就说它们是一个东西一样的荒谬。这是个很妙的比喻!不要手里锤子就看什么都像钉子!
4. State Tree = Computation = Operational Semantics
前面讲过了Computing Object和Computation的概念和关系,Computing Object通过一系列步骤/行为产生Computation,如果行为是有状态的那这个Computing Object就可以用State Machine来表示。那Computation的直观表示是什么呢?答案就是编程的另一支柱——Tree。
“One alternate definition of a computation is a state tree, which is a tree whose nodes are labeled by states. A state tree describes the tree of possible executions from the root’s state. A state machine generates every state tree for which (i) the root node is in the set of initial states and (ii) a node labeled with any state s has a child labeled with state t iff hs, ti is in the next-state relation (transition function).”
“For a C program, the state will describe what program variables are currently defined, their values, the contents of the heap and of the program’s call stack, the current control point, and so on. Specifying how to translate any legal C program into a state machine essentially means giving an operational semantics to the language. Writing an operational semantics is the only practical method of formally specifying the meaning of an arbitrary program written in a language as complicated as C.”
Tree的每个结点表示一个State,根结点就是初始状态。每条边就是Transition table里的一个Transition,根据是State还是State-Action Machine,边可以有附加的Action。这种将程序转换位状态机的翻译过程,赋予了程序操作语义,这也是对于任意复杂程序来说最实际的给出语义的方式。对于C、Java程序来说,状态就是局部变量、堆、调用栈、执行位置等。因此,后面我们将会看到的Recursion Tree就是用入参来表示问题的逐步递归分解,其实也就是:计算此问题的State Machine的Computation过程。
至此我们介绍了编程本质的第一个重要概念——State Machine,并以它为圆心,划出了本文要列举的几乎所有概念:Set、Function、Logic、Tree等。下面就逐一展开介绍,从而给各位读者一个关于编程本质的广阔图景。
当你编码时你在做什么:谈编程的本质(一)状态机