How to use JavaScript compiler

Tan Guangzhi 2020-11-10 15:10:47
use javascript compiler


A compiler is a program , Role is Translate one language into another .

for example babel It's a compiler , It will es6 Version of js Translate into es5 Version of js. From this point of view , Software translated into English will also be translated into English .

General procedure ,CPU It can't be executed directly , because CPU Only machine instructions can be recognized . So if you want to execute a program , The first step is to translate the program written in high-level language into assembly code (Java One more step , Translate high-level language into bytecode ), Then translate the assembly code into machine instructions , such CPU To identify and execute .

Because assembly language and machine language correspond one to one , And assembly language is more readable . Therefore, the textbook of computer principle will use assembly language instead of machine language to explain machine instructions .

This paper will write four operations compiler need to be 1 + 1 These four expressions are translated into machine instructions and executed . Please see the example for the specific process :

// CPU Can't recognize
10 + 5
// Translated into assembly language
push 10
push 5
add
// Finally translated into machine instructions , Assembly code corresponds to machine instructions one by one
// Machine instructions are given by 1 and 0 form , The following instructions are not real instructions , Just for demonstration
0011101001010101
1101010011100101
0010100111100001

Four compiler operations , Although the function is very simple , Only four Operational expressions can be compiled . But the front-end part of the principle of compilation almost involves : Lexical analysis 、 Syntax analysis . In addition, there is the code generation of the back-end part of the compilation principle . Whether it's simple 、 Complex compilers , The compilation steps are similar , It's just that complicated compiler implementations are more difficult .

Someone might ask , What's the advantage of learning to compile

I think that mastering the internal principles of the compilation process will make you a better advanced programmer . In addition, I quote Zhihu net friend - What you want Answer , More specific :

  1. It is easier to understand which writing methods are equivalent in a language , What are the differences
  2. We can compare the differences between different languages more objectively
  3. It is not easy to be fooled by the propagandist of a particular language
  4. Learning a new language is more efficient
  5. In fact, from the language a Switch to language b It's a general requirement , Learning the principles of compilation will make you more comfortable when dealing with such requirements

Okay , Let's take a look at how to write a quad arithmetic compiler .

Lexical analysis

A program is actually a series of characters stored in a text file , The function of lexical analysis is to decompose a series of characters into characters according to some rules (token, Also called terminator ), Ignore spaces and comments .

Example :

// Program code
10 + 5 + 6
// The result of lexical analysis is token
10
+
5
+
6

Terminator

Terminator is the basic element used in language , It can't be broken down anymore .

The terminators in the four operations include symbols and integer constants ( Unary operators and floating point operations are not supported ).

  1. Symbol + - * / ( )
  2. Integer constant :12、1000、111...

Lexical analysis code implementation

function lexicalAnalysis(expression) {
const symbol = ['(', ')', '+', '-', '*', '/']
const re = /\d/
const tokens = []
const chars = expression.trim().split('')
let token = ''
chars.forEach(c => {
if (re.test(c)) {
token += c
} else if (c == ' ' && token) {
tokens.push(token)
token = ''
} else if (symbol.includes(c)) {
if (token) {
tokens.push(token)
token = ''
}
tokens.push(c)
}
})
if (token) {
tokens.push(token)
}
return tokens
}
console.log(lexicalAnalysis('100 + 23 + 34 * 10 / 2'))
// ["100", "+", "23", "+", "34", "*", "10", "/", "2"]

The grammatical rules of four operations ( Grammar rules are hierarchical )

  1. x*, Express x Zero or more times
  2. x | y, Express x or y Will appear
  3. ( ) parentheses , Used to group words in language

The following rules look from left to right , The expression on the left can be further subdivided into the expression on the right , It's subdivided until it can't be subdivided again .

  • expression: addExpression
  • addExpression: mulExpression (op mulExpression)*
  • mulExpression: term (op term)*
  • term: '(' expression ')' | integerConstant
  • op: + - * /

addExpression Corresponding + - expression ,mulExpression Corresponding * / expression .

If you don't understand the above rules , Then put it down first , Keep looking down . See how to implement syntax analysis in code .

Syntax analysis

A process in which input text is analyzed according to grammatical rules and its grammatical structure is determined , It's called parsing .

The output of general parsing is abstract syntax tree (AST) Or parse tree (parse tree). But because the four operations are relatively simple , So the solution here is to do code generation and error reporting in real time , In this way, you don't need to save the whole program structure in memory .

Let's take a look at how to analyze a four Operational expression 1 + 2 * 3.

The first match is expression, Due to the present expression There's only one way to go down , namely addExpression, So it's broken down into addExpression.
By analogy , The next order is mulExpressionterm1(integerConstant)、+(op)、mulExpressionterm2(integerConstant)、*(op)、mulExpressionterm3(integerConstant).

As shown in the figure below :

There may be questions here , Why is an expression so complicated ,expression There is addExpression,addExpression And then there is mulExpression.
In fact, this is to consider the priority of operators ,mulExpr Than addExpr Expression operation level should be high .

1 + 2 * 3
compileExpression
| compileAddExpr
| | compileMultExpr
| | | compileTerm
| | | |_ matches integerConstant push 1
| | |_
| | matches '+'
| | compileMultExpr
| | | compileTerm
| | | |_ matches integerConstant push 2
| | | matches '*'
| | | compileTerm
| | | |_ matches integerConstant push 3
| | |_ compileOp('*') *
| |_ compileOp('+') +
|_

There are many algorithms for building parsing trees , There are only two algorithms .

Recursive descent analysis

Recursive descent analysis , Also known as top-down analysis . Analyze step by step recursively according to the rules of grammar token flow , If you encounter a non Terminator , Then go on to analyze , Until the Terminator .

LL(0) analysis

Recursive descent analysis is a simple and efficient algorithm ,LL(0) On this basis, there is one more step , When the first one token When it's not enough to determine the element type , Take... For the next character “ Check in advance ”, It's possible to solve this uncertainty .

The above is a brief introduction to these two algorithms , See the following code implementation for specific implementation .

Expression code generation

The four Operational expressions we usually use are infix expressions , But for computers, infix expressions are not easy to calculate . So in the code generation phase , To convert infix expression to suffix expression .

Postfix expression

Postfix expression , Also known as reverse Polish , Does not contain brackets , The operator is placed after two operands , All calculations are in the order in which the operators appear , Strictly from left to right ( The precedence rules for operators are no longer considered ).

Example :

Infix expression : 5 + 5 Convert to suffix expression :5 5 +, Then generate code according to the suffix expression .

// 5 + 5 Convert to 5 5 + Generate code again
push 5
push 5
add

Code implementation

The theoretical knowledge of compiler principles is like the book of heaven , It's often seen in the clouds , But really do it , You'll find that , Actually, it's quite simple .

If the above theoretical knowledge is not very clear , No problem , First look at the code implementation , And then combine it with theoretical knowledge to see .

Be careful : Here we need to introduce the lexical analysis code .

// Assembly code generator
function AssemblyWriter() {
this.output = ''
}
AssemblyWriter.prototype = {
writePush(digit) {
this.output += `push ${digit}\r\n`
},
writeOP(op) {
this.output += op + '\r\n'
},
// Output assembly code
outputStr() {
return this.output
}
}
// parsers
function Parser(tokens, writer) {
this.writer = writer
this.tokens = tokens
// tokens Array index
this.i = -1
this.opMap1 = {
'+': 'add',
'-': 'sub',
}
this.opMap2 = {
'/': 'div',
'*': 'mul'
}
this.init()
}
Parser.prototype = {
init() {
this.compileExpression()
},
compileExpression() {
this.compileAddExpr()
},
compileAddExpr() {
this.compileMultExpr()
while (true) {
this.getNextToken()
if (this.opMap1[this.token]) {
let op = this.opMap1[this.token]
this.compileMultExpr()
this.writer.writeOP(op)
} else {
// There is no corresponding operator on the match There's no match here + -
// take token Index back one bit
this.i--
break
}
}
},
compileMultExpr() {
this.compileTerm()
while (true) {
this.getNextToken()
if (this.opMap2[this.token]) {
let op = this.opMap2[this.token]
this.compileTerm()
this.writer.writeOP(op)
} else {
// There is no corresponding operator on the match There's no match here * /
// take token Index back one bit
this.i--
break
}
}
},
compileTerm() {
this.getNextToken()
if (this.token == '(') {
this.compileExpression()
this.getNextToken()
if (this.token != ')') {
throw ' Missing right bracket :)'
}
} else if (/^\d+$/.test(this.token)) {
this.writer.writePush(this.token)
} else {
throw ' FALSE token: The first ' + (this.i + 1) + ' individual token (' + this.token + ')'
}
},
getNextToken() {
this.token = this.tokens[++this.i]
},
getInstructions() {
return this.writer.outputStr()
}
}
const tokens = lexicalAnalysis('100+10*10')
const writer = new AssemblyWriter()
const parser = new Parser(tokens, writer)
const instructions = parser.getInstructions()
console.log(instructions) // Output generated assembly code
/*
push 100
push 10
push 10
mul
add
*/

Simulation execution

Now let's simulate CPU The execution of machine instructions , Because assembly code corresponds to machine instructions one by one , So we can create a simulator that directly executes assembly code .
Before creating the emulator , Let's first explain the operation of the relevant instructions .

Stack

In memory , The characteristic of stack is that it can only insert and delete at the same end , That is, only push and pop Two kinds of operations .

push

push The purpose of the instruction is to push an operand onto the stack .

pop

pop The function of the instruction is to pop an operand out of the stack .

add

add The function of an instruction is to execute it twice pop operation , Pop up two operands a and b, And then execute a + b, Then the result push Go to the stack .

sub

sub The function of an instruction is to execute it twice pop operation , Pop up two operands a and b, And then execute a - b, Then the result push Go to the stack .

mul

mul The function of an instruction is to execute it twice pop operation , Pop up two operands a and b, And then execute a * b, Then the result push Go to the stack .

div

sub The function of an instruction is to execute it twice pop operation , Pop up two operands a and b, And then execute a / b, Then the result push Go to the stack .

All the instructions of the four operations have been explained , Do you think it's very simple ?

Code implementation

Be careful : Need to introduce lexical analysis and syntax analysis code

function CpuEmulator(instructions) {
this.ins = instructions.split('\r\n')
this.memory = []
this.re = /^(push)\s\w+/
this.execute()
}
CpuEmulator.prototype = {
execute() {
this.ins.forEach(i => {
switch (i) {
case 'add':
this.add()
break
case 'sub':
this.sub()
break
case 'mul':
this.mul()
break
case 'div':
this.div()
break
default:
if (this.re.test(i)) {
this.push(i.split(' ')[1])
}
}
})
},
add() {
const b = this.pop()
const a = this.pop()
this.memory.push(a + b)
},
sub() {
const b = this.pop()
const a = this.pop()
this.memory.push(a - b)
},
mul() {
const b = this.pop()
const a = this.pop()
this.memory.push(a * b)
},
div() {
const b = this.pop()
const a = this.pop()
// Floating point operations are not supported , So round it up here
this.memory.push(Math.floor(a / b))
},
push(x) {
this.memory.push(parseInt(x))
},
pop() {
return this.memory.pop()
},
getResult() {
return this.memory[0]
}
}
const tokens = lexicalAnalysis('(100+ 10)* 10-100/ 10 +8* (4+2)')
const writer = new AssemblyWriter()
const parser = new Parser(tokens, writer)
const instructions = parser.getInstructions()
const emulator = new CpuEmulator(instructions)
console.log(emulator.getResult()) // 1138

A simple four arithmetic compiler has been implemented . Let's write a test function and run it , Let's see if the results are what we expected :

function assert(expression, result) {
const tokens = lexicalAnalysis(expression)
const writer = new AssemblyWriter()
const parser = new Parser(tokens, writer)
const instructions = parser.getInstructions()
const emulator = new CpuEmulator(instructions)
return emulator.getResult() == result
}
console.log(assert('1 + 2 + 3', 6)) // true
console.log(assert('1 + 2 * 3', 7)) // true
console.log(assert('10 / 2 * 3', 15)) // true
console.log(assert('(10 + 10) / 2', 10)) // true

All tests are correct . Also attached Complete source code , It is suggested that the students who don't understand it should read it twice more .

Take it to the next level

For industrial compilers , The four arithmetic compilers belong to toys in toys . But you can't be a fat man at one bite , So it's better to learn compiler principles step by step . Let's introduce a more advanced compiler , This compiler can compile a Jack Language ( class Java Language ), Its grammar is like this :

class Generate {
field String str;
static String str1;
constructor Generate new(String s) {
let str = s;
return this;
}
method String getString() {
return str;
}
}
class Main {
function void main() {
var Generate str;
let str = Generate.new("this is a test");
do Output.printString(str.getString());
return;
}
}

The output of the above code is :this is a test.

Do you want to implement such a compiler ?

This compiler comes from a Book 《 Computer system elements 》, It starts from the 6 Chapter begins , All the way to 11 Chapter describes the assembly compiler ( Convert assembly language to machine language )、VM compiler ( Will be similar to bytecode VM Language is translated into assembly language )、Jack Language compiler ( High level language Jack Translate into VM Language ). Each chapter has detailed explanation and experiment of knowledge points , As long as you do the experiment step by step , We can finally implement such a compiler .

If the compiler is finished , Finally, where does machine language execute ?

This book has been considered for you , It starts from the 1 The first chapter 5 Chapter , There are five chapters . Teach you to start with logic gates , Step by step, the arithmetic logic unit is formed ALU、CPU、 Memory , And finally build a modern computer . And then let the program that you compile with a compiler run on this computer .

in addition , The first of this book 12 Chapter will teach you how to write various library functions of the operating system , for example Math library ( It involves all kinds of mathematical operations )、Keyboard library ( How to press the keyboard is output to the screen )、 Memory management and so on .

I want to have a look at the whole book 12 What happened after Zhang's experiment ? I'm here to provide a few pictures of this analog computer running program DEMO GIF, For your reference .

The top right corner of these pictures is “ Computer ” The screen of , The other parts are “ Computer ” Stack area and instruction area of .

I have done all the experiments in this book ( Spend every day 3 Hours , It can be done in two months ), The answer is in my github On , If you are interested, you can have a look at .

Reference material

版权声明
本文为[Tan Guangzhi]所创,转载请带上原文链接,感谢

  1. 【计算机网络 12(1),尚学堂马士兵Java视频教程
  2. 【程序猿历程,史上最全的Java面试题集锦在这里
  3. 【程序猿历程(1),Javaweb视频教程百度云
  4. Notes on MySQL 45 lectures (1-7)
  5. [computer network 12 (1), Shang Xuetang Ma soldier java video tutorial
  6. The most complete collection of Java interview questions in history is here
  7. [process of program ape (1), JavaWeb video tutorial, baidu cloud
  8. Notes on MySQL 45 lectures (1-7)
  9. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  10. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  11. 精进 Spring Boot 03:Spring Boot 的配置文件和配置管理,以及用三种方式读取配置文件
  12. Refined spring boot 03: spring boot configuration files and configuration management, and reading configuration files in three ways
  13. 【递归,Java传智播客笔记
  14. [recursion, Java intelligence podcast notes
  15. [adhere to painting for 386 days] the beginning of spring of 24 solar terms
  16. K8S系列第八篇(Service、EndPoints以及高可用kubeadm部署)
  17. K8s Series Part 8 (service, endpoints and high availability kubeadm deployment)
  18. 【重识 HTML (3),350道Java面试真题分享
  19. 【重识 HTML (2),Java并发编程必会的多线程你竟然还不会
  20. 【重识 HTML (1),二本Java小菜鸟4面字节跳动被秒成渣渣
  21. [re recognize HTML (3) and share 350 real Java interview questions
  22. [re recognize HTML (2). Multithreading is a must for Java Concurrent Programming. How dare you not
  23. [re recognize HTML (1), two Java rookies' 4-sided bytes beat and become slag in seconds
  24. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  25. RPC 1: how to develop RPC framework from scratch
  26. 造轮子系列之RPC 1:如何从零开始开发RPC框架
  27. RPC 1: how to develop RPC framework from scratch
  28. 一次性捋清楚吧,对乱糟糟的,Spring事务扩展机制
  29. 一文彻底弄懂如何选择抽象类还是接口,连续四年百度Java岗必问面试题
  30. Redis常用命令
  31. 一双拖鞋引发的血案,狂神说Java系列笔记
  32. 一、mysql基础安装
  33. 一位程序员的独白:尽管我一生坎坷,Java框架面试基础
  34. Clear it all at once. For the messy, spring transaction extension mechanism
  35. A thorough understanding of how to choose abstract classes or interfaces, baidu Java post must ask interview questions for four consecutive years
  36. Redis common commands
  37. A pair of slippers triggered the murder, crazy God said java series notes
  38. 1、 MySQL basic installation
  39. Monologue of a programmer: despite my ups and downs in my life, Java framework is the foundation of interview
  40. 【大厂面试】三面三问Spring循环依赖,请一定要把这篇看完(建议收藏)
  41. 一线互联网企业中,springboot入门项目
  42. 一篇文带你入门SSM框架Spring开发,帮你快速拿Offer
  43. 【面试资料】Java全集、微服务、大数据、数据结构与算法、机器学习知识最全总结,283页pdf
  44. 【leetcode刷题】24.数组中重复的数字——Java版
  45. 【leetcode刷题】23.对称二叉树——Java版
  46. 【leetcode刷题】22.二叉树的中序遍历——Java版
  47. 【leetcode刷题】21.三数之和——Java版
  48. 【leetcode刷题】20.最长回文子串——Java版
  49. 【leetcode刷题】19.回文链表——Java版
  50. 【leetcode刷题】18.反转链表——Java版
  51. 【leetcode刷题】17.相交链表——Java&python版
  52. 【leetcode刷题】16.环形链表——Java版
  53. 【leetcode刷题】15.汉明距离——Java版
  54. 【leetcode刷题】14.找到所有数组中消失的数字——Java版
  55. 【leetcode刷题】13.比特位计数——Java版
  56. oracle控制用户权限命令
  57. 三年Java开发,继阿里,鲁班二期Java架构师
  58. Oracle必须要启动的服务
  59. 万字长文!深入剖析HashMap,Java基础笔试题大全带答案
  60. 一问Kafka就心慌?我却凭着这份,图灵学院vip课程百度云