学校一般用自顶向下方法做计网教材,我觉得这本书也是可以做计网教材的。
A protocol stack is a logical grouping of protocols that work together.
many protocols commonly address the following issues:
应用层: provides a means for users to actually access network resources
传输层:Through flow control, segmentation/desegmentation, and error control, the transport
layer makes sure data gets from point to point error-free. Because ensuring reliable data transportation can be extremely cumbersome, the OSI model devotes an entire layer to it.
网络层:routing data between physical networks(路由转发)
数据链路层: transporting data across a physical network
物理层:AD/DA
The term packet refers to a complete protocol data unit that includes header and footer information
from all layers of the OSI model.
keep in mind that not every packet on a network is generated from an application layer protocol.
集线器:A hub is no more than a repeating device that operates on the physical layer of the OSI model. It takes packets sent from one port and transmits (repeats) them to every other port on the device.
(半双工,大量无用流量,物理层)
交换机:Like a hub, a switch is designed to repeat packets. However, unlike a hub, rather than broadcasting data to every port, a switch sends data to only the computer for which the data is intended.
(全双工,仅指向特定MAC,因此是数据链路层)
路由器:Routers operate at layer 3 of the OSI model, where they are responsible for forwarding packets between two or more networks. Routers commonly use layer 3 addresses (such as IP addresses) to uniquely identify devices on a network.
广播:A broadcast packet is one that is sent to all ports on a network segment, regardless of whether that port is a hub or switch. All broadcast traffic is not created equally, however. There are layer 2 and layer 3 forms of broadcast traffic. For instance, on layer 2, the MAC address FF:FF:FF:FF:FF:FF is the reserved broadcast address, and any traffic sent to this address is broadcast to the entire network segment. Layer 3 also has a specific broadcast address.The highest possible IP address in an IP network range is reserved for use as the broadcast address.
The extent to which broadcast packets travel is called the broadcast domain. A device’s broadcast domain extends until it reaches a router.
(arp发现就是数据链路层的广播,问某个ip对应的MAC)
多播:The primary method of implementing multicast is via an addressing scheme that joins the packet recipients to a multicast group, which is how IP multicast works. This addressing scheme ensures that the packets cannot be transmitted to computers to which they are not destined. In fact, IP devotes an entire range of addresses to multicast. If you see an IP address in the 224.0.0.0 to 239.255.255.255 range, it is most likely multicast traffic.
A simple network in which all devices are connected via hubs or switches is called a local area network (LAN). When you want to connect two LANs together, you can do so with a router. Complex networks can consist of thousands of LANs connected through thousands of routers worldwide.
The Internet itself is a collection of millions of LANs and routers.
A key decision for effective packet analysis is where to position a packet sniffer to appropriately capture the data.
(把嗅探器放进网络的过程称之为tap into the wire,得名于第三种方案)
Most operating systems (including Windows) will not let you use a a network interface card in promiscuous mode unless you have elevated user privileges.
监听hub network是最简单的,网线插到同一个hub就行。不过hub已经基本弃用了。接下来讲的是switch network。
对于switch network,When you connect a sniffer to a port on a switch, you can see only
broadcast traffic and the traffic transmitted and received by your machine.
There are four primary ways to capture traffic from a target device on a switched network: port mirroring, hubbing out, using a tap, and ARP cache poisoning.
作者还提到了第五种方法, install a packet sniffing application on a single device from which we want to capture traffic. 也就是下表中的direct install。
As analysts, we need to be as stealthy as possible. In a perfect world, we collect the data we need without leaving a footprint. Just as forensic investigators don’t want to contaminate a crime scene, we don’t want to contaminate our captured network traffic.(这段话解释了arp cache poisoning为什么’sloppy’,因为它需要注入新的包)
接下来讲的是router network。
All of the techniques for tapping into the wire on a switched network are available on routed networks as well.
不过,由于route network中多个segment的存在, it is often necessary to sniff the traffic of multiple devices on multiple segments in order to pinpoint a problem.
可以将capture的结果保存或导出,可以只保存符合条件的包的记录。导出是为了在其它地方查看或者用其它工具进行包分析。
保存的文件可以融合。
使用ctrl-f可以进行包查找,有三种查找方式,之前用的应该都是display filter:
ctrl-n和ctrl-b可以在查找结果中进行上下切换。
还可以对包进行标记:
To mark a packet, right-click it in the Packet List pane and choose Mark Packet from the pop-up or click a packet in the Packet List pane and press CTRL-M. To unmark a packet, toggle this setting off using CTRL-M again. You can mark as many packets as you wish in a capture. To jump forward and
backward between marked packets, press SHIFT-CTRL-N and SHIFT-CTRL-B, respectively.
可以打印输出文件。
Wireshark can show the absolute timestamp indicating the exact moment when the packet was captured, as well as the time in relation to the last captured packet(或者the last displayed packet) and the beginning and end of the capture.
除此之外,还可以以某个包为计时基准。
wireshark有自动保存功能。
For instance, you can create a trigger that creates a new file after every 1MB of traffic captured, or after every minute of traffic captured.
类似地,也可以设置用于停止的trigger。
题外话:https://www.cnblogs.com/mauricewei/p/10502300.html
因为在书中看见了ring buffer,也就是循环缓冲区。
实时显示捕捉到的包并自动滚动到底部是processor intensive操作,可以取消勾选。
filters其实有两种,capture filters和display filters。之前用的应该都是后者。
A simple example of when you might use a capture filter is when capturing traffic on a server with multiple roles.
Capture filters are applied by WinPcap and use the Berkeley Packet Filter (BPF) syntax. This syntax is common in several packet-sniffing applications, mostly because most packet-sniffing applications rely on the libpcap/WinPcap libraries, which allow for the use of BPFs.
关于怎么写filter expression、BPF的知识,原书亦有详细介绍,在此节选。
One of the real powers of the BPF syntax is the ability that it gives us to examine every byte of a protocol header in order to create very specific filters based on that data.
you can also specify the length of the data to be returned in your filter expression by appending the byte length after the offset number within the square brackets, separated by a colon.
icmp[0:2]表示offset为0,length为2。
所以说,熟悉不同协议header各字段的含义对于写expression至关重要。
wireshark还提供了能组合现成filter的GUI。当然,也可以自己写expression。
注意上下两幅图一个是capture filter,一个是display filter。
本书第五章介绍了高级的wireshark功能(capture-reformat-statistics)
题外话:内网网段有三段,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16
Notice that not all totals add up to exactly 100 percent. Because many of the packets contain multiple protocols from various layers.
The Protocol Hierarchy Statistics window is often one of the first windows you look at when examining traffic. It really gives you a good snapshot of the type of activity occurring on a network.
通过配置wireshark的capture option,可以使用wireshark自身的name resolution,以此方便阅读。有三类,
安全指南中经常提到不要使用默认端口,如果有这类特殊配置的话:
Unfortunately, Wireshark does not always make the right choices when selecting the dissector to use on a packet. This is especially true when it is using a protocol on the network in a nonstandard configuration, such as a non-default port (which is often configured by network administrators as a
security precaution or by employees trying to circumvent access controls). Luckily, we can change the way Wireshark implements certain dissectors.
如上图,443被用作了FTP,wireshark可以调整过来。
在wireshark官网上的develop-browse the code可以查看源码。file-epan-dissectors下以packet开头的就是各协议的dissector(即翻译器)。
wireshark有tcp流追踪的功能:
Rather than viewing data being sent from client to server in a bunch of small chunks, the Follow TCP Stream feature sorts the data to make it easier to view. This comes in handy when viewing plaintext application layer protocols such as HTTP, FTP, and so on.
wireshark能够统计包的长度:
Under normal circumstances, the maximum size of a frame on an Ethernet network is 1,518 bytes. When you subtract the Ethernet, IP, and TCP headers from this number, that leaves you with 1,460 bytes that can be used for the transmission of a layer 7 protocol header or data.The Ethernet header
is 14 bytes (plus a 4-byte CRC), the IP header is a minimum of 20 bytes, and a TCP packet with no data or options is also 20 bytes.
If there are a lot of large packets, it may be safe to assume that data is being transferred. If the majority of packets are small, you may assume that the capture consists of protocol control commands, without a great deal of data being passed.
高亮的属于large packets,40-79的属于small packets。
wireshark可以统计实时IO和round trip time并绘图。
wireshark提供expert info,可以发现流量中的异常。书中罗列了所有可能的告警信息。
The ARP resolution process uses only two packets: an ARP request and an ARP response.
You can view the ARP table of a Windows host by typing arp –a from a command prompt.
Operation字段: either 1 for a request or 2 for a reply.
Hardware type一般是Ethernet,对应的hardware address length为6。MAC 48 bits。
对于普通的arp request,缺的是目标ip的MAC;
对于gratuitous arp request,广播的目的是告诉大家自己的ip和自己的mac,因为设备的ip可能会变化。此类包的特征就是, the sender IP address and the target IP address are the same
An IP address consists of two parts: a network address and a host address. The network address
identifies the LAN the device is connected to, and the host address identifies the device itself on that network.
IP addresses and netmasks are commonly written in Classless Inter-Domain Routing (CIDR) notation for shorthand.For example, an IP address of 10.10.1.22 and a netmask of 255.255.0.0 would be written in CIDR notation as 10.10.1.22/16.
Type of Service
A precedence flag and type of service flag, which are used by routers to prioritize traffic.
identification
A unique identification number used to identify a packet or sequence of fragmented packets.
flags(more flag,如果为1表示后面还有)
Used to identify whether or not a packet is part of a sequence of fragmented packets.
fragment offset
If a packet is a fragment, the value of this field is used to reassemble the packets in the correct order.
这个值的典型值如1480、2960。
options
Reserved for additional IP options. It includes options for source routing and timestamps.
正常说来,路由是由中间的路由器内置的路由表决定的,但源路由不同,由源地址决定。
源路由,使用者可以指定包的路由,可用于测试某特定网络的吞吐率,也可以使数据包绕开出错的网络。
TTL的设置主要是为了避免死循环。可通过的路由器的最大值。
关于MTU与分段:
Although there are standard MTU settings, the MTU of a device can be reconfigured manually in most cases. An MTU setting is assigned on a per-interface basis and can be modified on Windows and Linux systems, as well as on the interfaces of managed routers.
记住1518,1500,1480,1460,14+4,20,20。
If the data size is greater than the MTU, the packet will be fragmented.
TCP:
All TCP-based communication works the same way: a random source port is chosen to communicate to a known destination port.
TCP头:
TCP端口:
16位,65535个,0号预留,1-1023是标准端口。wireshark对大于1024的端口同样有列表记录,不过仅供参考。可以自定义修改这个列表。If you wish to leave this functionality enabled but want to change how Wireshark identifies a certain port, you can do so by modifying the Services file located
in the Wireshark program directory, which is based on the Internet Assigned Numbers Authority (IANA) common ports listing.
iana规定了1-1023,链接和截图如下:
http://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xhtml
书中对tcp的三次握手进行了抓包分析。
TCP四次挥手:
TCP reset:
In an ideal world, every connection would end gracefully with a TCP teardown. In reality, connections often end abruptly. For example, this may occur due to a potential attacker performing a port scan or simply a misconfigured host. In these cases, a TCP packet with the RST flag set is used. The RST flag is used to indicate a connection was closed abruptly or to refuse a connection attempt.
(比如说,src向dst未监听的端口发包,就会收到rst包)
The RST packet contains nothing other than RST and ACK flags , and no further communication follows.
为什么不能两次握手:因为通信双方的初始序列号都需要得到确认。
为什么握手三次而挥手四次:因为半关闭的需求,所以挥手中的ACK和FIN不能合二为一。
UDP的目标、无连接的含义:
While TCP is designed for reliable data delivery with built-in error checking, UDP aims to provide speedy transmission. For this reason, UDP is a best-effort service, commonly referred to as a connectionless protocol.A connectionless protocol does not formally establish and terminate
a connection between hosts, unlike TCP with its handshake and teardown processes.
如何平衡UDP的不可靠:
the protocols that rely on UDP typically have their own built-in reliability services, or use certain features of ICMP to make the connection somewhat more reliable. For example, the application-layer protocols DNS and DHCP, which are highly dependent on the speed of packet transmission across a network, use UDP as their transport layer protocol, but they handle error checking and retransmission timers themselves.
要么容忍丢包,要么在应用层保障可靠性。
UDP头:
https://blog.csdn.net/u013929635/article/details/80254973 为什么UDP有长度,TCP无长度
从信息冗余的角度来看,UDP的长度信息是不需要的。
因为IPv4的头部已经包含了数据的长度信息,这里的数据如果当前的通讯协议是UDP,那么这个长度就包含了UDP的头部和UDP的数据,因为UDP的头部长度固定为8字节,那么这样就很容易的算出UDP的数据的长度了。
如果没有这这部分信息,那么UDP的头部就不是32对齐的。
ip头、tcp头、udp头都有校验和,都是16位。ip头中的校验和是针对ip头的(不包括后面的数据)。
icmp:
For example, a Type field value of 3 indicates “Destination Unreachable.” Code field value of 3, indicating “Port Unreachable,” For a full list of available ICMP types and codes, see http://www.iana.org/assignments/icmp-parameters.
icmp包的长度是可以变化的,用于测试分片。icmp包的data是随机的,可能用来隐秘通信。
variable在普通的问答中是相同的序列号,表示回答匹配提问。
icmp除了ping,还可以用于traceroute。基本原理是递增的ttl,可以确保沿途每一个router都回复一次。回复包中会附带发送包的ip头和icmp头。
IP, TCP, UDP, and ICMP are at the foundation of all network communications.
DHCP:
目前的DHCP服务器一般会同时提供ip地址、dns服务器地址和默认网关。
DHCP是基于UDP的(抓包时会显示五层的信息,可以看见传输层的协议)。
Wireshark still references BOOTP when dealing with DHCP. 前者是后者的前身。
dhcp使用67和68两个端口。一个用于收,一个用于发。
DORA
discover: src 0.0.0.0 dst 255.255.255.255, 因为提问者还没有jp,也不知道dhcp server的ip
offer: dhcp服务器把准备好的ip地址放在your ip address,还有自己的ip地址。options中有租约时长等信息。首先使用提问者的MAC,不行就广播。
request: src 0.0.0.0 dst 255.255.255.255,但options中会包含自己准备请求的ip和dhcp server的ip
acknowledgement: 内容类似offer,不过next server ip为空。
详情可以看四幅原书插图。
租约的刷新只用做后两步。
DHCP比较灵活,option中的type字段决定了option的其他内容。
DOMAIN NAME SYSTEM
也是基于UDP。
其它三个count字段含义是类似的,其它三个section的含义也是类似的。
DNS ID用于匹配问答,16-31位是各标志位和返回值。
QR:query/request
AA:authoritative answer
TC:truncation(应答包过大而截断)
RD:recursion desired
RA:recursion available
Z:reserved
RCode:response code
DNS服务器默认使用53端口。
DNS的应答包里也会包含查询包的查询。
上图的IXFR和AXFR,zone transfer,每台DNS server都有自己管理的命名空间,有的时候需要备用的DNS server,此时就需要zone transfer。这个过程使用的是TCP。
靠近叶端的服务器成为子域的权威服务器。
上图有点网络拓扑的味道,所以要考虑到泄露的风险。The data contained in a zone transfer can be very dangerous in the wrong hands. For example, by enumerating a single DNS server, you can map a network’s entire infrastructure.
HTTP:
http的包结构因目的不同会有很大差异。
HTTP1.1有8种request方法。
上图中, the client is sending a request to download (GET) the root web directory (/) of the web server using version 1.1 of HTTP.
agent和language等信息是为了让服务器返回合适版本的页面。
HTTP is used only to issue application layer commands between the client and server. When it’s time to transfer data, application layer control is not seen, except for at the beginning and end of the data stream.
一个http请求,可能会有多个标记为tcp的包。
这里特指360断网急救箱解决的那一类问题。
只有一台电脑上不了网
抓包发现这台电脑反复对外发送同一dns请求。这说明其dns请求一直没有得到回答。
然而,内网中其它电脑可以上网,说明内网中其它电脑可以访问到外部的dns服务器(路由没问题),且外部dns服务器没有问题。
答案是这台电脑没有使用dhcp,其默认网关配置错误,所以其dns请求并未到达外部的dns服务器。
只有一台电脑上不了google
抓包发现这台电脑在试图上google时,能正确获取到默认网关的地址,但没有发出任何dns请求,随后
对某ip的80端口发起连接后收到rst,这说明google故障或者根本就不是google。
没发dns请求说明本机有dns cache,或者本机hosts配置了。
答案是本机hosts文件将google解析到一个内网地址,而该机没开80端口。
所有电脑都上不了google
抓包发现dns有请求有应答,并且是正确的应答。
发现发出了大量的tcp syn报文,但没有得到任何回应。
这可能是因为这些tcp syn报文被过滤了,或者是google有问题。
答案是google暂时出了问题。
打印机只打印前几页
抓包发现前面的数据包能收到打印机的ack,但之后出现了tcp重传。通过调整wireshark的时间显示设置,发现第一个未经确认的包在5.5秒之后发生了重传。此后打印机再无回应。
换不同的电脑都有同样的问题,因此大概率是打印机的问题。
答案是打印机的ram故障了,当打印机的ram的某个位置被访问时,打印机就停止工作了。
分部连不上总部
在分部电脑抓包,分部电脑能发dns,但分部的dns服务器回复一个failure。
在分部dns server抓包,分部的dns向总部dns server的53端口发起了tcp连接而不是udp连接。
这说明,很有可能是发生了dns domain transfer,并且transfer失败了。
在总部dns server抓包,没有收到分部dns server的请求。
答案是,总部路由器的防火墙禁止对53端口的tcp连接。
收端和发端的数据不一样
这可能是网络问题,也有可能不是。
在接收端抓包,通过filter查ftp的stor命令,找到传输的起点,然后跟踪tcp stream,导出csv文件,对比导出文件与发送端文件的md5。
rto, retransmission timeout
重传是有次数上限的。windows默认是5,linux一般默认是15。
timer超过RTO就会重传(连续重传时,RTO会不断加倍),连续收到三个冗余ack之后就会发生快速重传。
wireshark 的 seq/ack analysis一栏可以提供单独观察一个包时得不到的信息,比如重传和快速重传。
TCP流量控制:
如果想接收大于窗口的数据,必须要先ack并且清空一次缓冲区。
但有的时候即使ack了也来不及立即处理,此时需要调整窗口大小。
服务器发现接受速率过大的时候,会调整窗口大小,并告知众客户端。这样客户端就会慢点发。
client在收到0窗口后,会定期发送keep alive包。或称为保活探测。
keep alive包的序列号与之前相同(纯ack包无数据),或是之前-1(1字节数据如0x0)。
除了用于流量控制,服务端也有可能会发送keep alive给客户端,以清理意外断开的连接。
如何判断延迟的来源:
首先抓建立连接的六个包(三次握手、http请求、http应答、第一次传数据),观察其延迟。
1s左右算延迟高,正常情况下每个包<0.1s。
第一种情况:线路造成的延迟
三次握手和http应答几乎不需要处理,所以如果它们的延迟很高,基本可以确定是线路问题(比如防火墙、代理、路由器),而不是服务端或客户端的问题。
第二种情况:客户端造成的延迟
主要看http请求的延迟是否较高。
第三种情况:服务端造成的延迟
主要看第一次传数据的延迟是否较高。
六包法的精度不太够,它可以发现1个延迟多了1s的,但很难发现10个延迟多了0.1s的。对此,还有基线法。快慢是相对的,通过一段时间的抽样,总结出基线,作为快慢的参照物。
抓包也可以用于入侵检测和取证。
从上两个图中,我们知道53、80和22开了,因为服务端反复回答了syn-ack,113、25、31337、70关闭了,因为服务端回答了rst。剩下的其它端口很有可能是关的,但无法确定。
操作系统判断:
由于rfc没有规定所有东西,不同的操作系统对tcp/ip协议栈有不同的实现。
通过这种实现差异,可以实现被动扫描。
至于操作系统的主动扫描,参考Nmap Network Scanning, by the tool’s author Gordon “Fyodor” Lyon.
一次tcp reverse shell攻击的抓包分析
iframe,内联框架,用于在html文档中嵌入另一个html文档。不容易被用户发现。
从图中可以看出,script括号内有大量明显是编码后的字符,还有作用是编解码的JavaScript代码。此外,iframe中有一个名字奇怪的gif。
以及最明显的,明文中的shell。上述三个特征是经典的tcp reverse shell。
如果想制作一个好的入侵检测系统,关键是收集足够多的入侵数据(类似上面这种),从而总结规律(专业点说,叫构造signature库)。 To learn more about intrusion detection and attack signatures, visit the Snort project at http://www.snort.org/.
一个实际例子是,IDS会检查进入内网的数据包,如果包含hexadecimal content 41 4E 41 42 49 4C 47 49 7C,也就是ANA BILGI in human-readable ASCII,就会发出警报。这跟老式的病毒检测系统很像。
MITM man-in-the-middle 的四种情况
arp欺骗
dns欺骗
tcp会话劫持
ssl劫持
wireshark可以导出文件,比如ftp传送的文件,jpg文件等等。比如上面提到的入侵检测的实例,jpg文件的头部会有jfif:
如果想顺利导出,可以使用winhex等二进制编辑器把前面多余的内容删掉。
无线网的协议是802.11。美国规定只有11个频道,每个无线局域网只有可能位于其中一个频道,一次也只能监听其中一个频道。
当然,已经有专门的无线扫描工具(比如kismet wireless),利用channel hopping,能快速在频道间切换从而实现类似多频道监听的效果。
除了墙体、反射面等对信号的影响,还应该考虑到相邻频道之间的干涉。
无线网的故障排查更为复杂,需要做频道分析,观察是否是频道干涉造成丢包等问题。
发现了一个宝藏博主,https://www.aneasystone.com/archives/2016/08/wireless-analysis-one-monitoring.html,是一位理论基础扎实又爱自己动手的博主。除了无线网络安全,博主对其它内容也有研究。
原书中讲解了windows下的无线包嗅探(需要借助usb设备airpcap,因为windows的nic(网络接口控制器)驱动一般不允许切换到monitor模式,即使nic本身支持也不行)和linux下的无线包嗅探(只要本机nic支持monitor模式,切换即可)。操作细节可以看原书和上面的博主。
无线包和有线包的区别在于,无线包的数据链路层有一个802.11头。
802.11有三种包。
以beacon包为例,其80211头中包括beacon interval、WAP(无线接入点)的SSID、WAP的设备类型、所在频道等信息。
wireshark抓到的无线包能提供的其它比较重要的参数如RSSI (received signal strength indication), TX rate (传输速率)。
无线网抓包相比有线网更不容易过滤,因为对于无线网来说,整个空气中就十几个频道,监听一个频道就会监听整个频道上的所有信息。而有线网可以轻松指定过滤会话双方。
关于SSID和BSSID:
举个例子,一家公司面积比较大,安装了若干台无线接入点(AP或者无线路由器),公司员工只需要知道一个SSID就可以在公司范围内任意地方接入无线网络。BSSID其实就是每个无线接入点的MAC地址。当员工在公司内部移动的时候,SSID是不变的。但BSSID随着你切换到不同的无线接入点,是在不停变化的。(https://www.zhihu.com/question/24362037/answer/36048064)
在很多情况下,可以将BSSID理解为WAP的MAC。
WEP,wired equivalent privacy
WPA,Wi-Fi protected access
此外还有WPA2。
WEP的过程(一次问答就出结果):
WPA的过程(成功需要两次问答,失败需要更多次):
tcpdump & windump :前者可以配合sed(插删查替)和awk(正则、类excel)使用,后者是windows版。
cain & abel :工具集。用途之一是arp缓存下毒。
scapy:用于包伪造的python库。
netdude:linux下的包伪造和包编辑,有GUI。
colasoft packet builder:windows下的包伪造和包编辑,有GUI。
cloudshark:在线分享抓到的包。
pcapr:在线分享各种协议的demo抓包。
networkMiner:主要用于取证,擅长解析PCAP文件
ngrep:用于在PCAP文件中搜索,适用于过滤器太过复杂甚至做不到的情况。
libpcap:用于包分析的c/c++库,wireshark和tcpdump都有借用其内容。
hping:包伪造、包编辑、包发送。
domain dossier:可以用于查询域名/ip的注册信息。
wireshark 官网:上有官方文档和源码
SANS Security Intrusion Detection In-Depth Course:作者东家的课程
http://www.chrissanders.org/ :作者的博客
http://www.packetstan.com/ :作者力荐的博客
http://www.wiresharktraining.com/ :作者推荐的wireshark布道人
http://www.iana.org/ :查1-1023端口号,查RFC
TCP/IP illustrated by Richard Stevens, tcp/ip bible
The TCP/IP guide by Charles Kozierok, great for visual learner