读书笔记之:Beej_s Network Programming Using Internet Sockets

lyjinger读书笔记之:Beej_s Network Programming Using Internet Sockets

原书信息:
名称:Beej_s Network Programming Using Internet Sockets
版本:Version 2.4.0
日期:May 6, 2007
作者:Brian “Beej Jorgensen” Hall

说明:本来是想找W. Richard Stevens的Unix Network Programming来看的,结果没找到该书的英文电子版,却搜到了这个,于是顺便了解下别人是怎么理解网络编程的。总体感觉内容一般,不够深入,适合初学者,不过其中关于如何传输二进制数据、如何完整的发送和接收数据包比较实用。

读书笔记:

What is a socket?
  a way to speak to other programs using standard Unix file descriptors.

Where do I get this file descriptor for network communication, Mr. Smarty-Pants?
  You make a call to the socket() system routine. It returns the socket descriptor

If it's a file descriptor, why in the name of Neptune can't I just use the normal
read() and write() calls to communicate through the socket?
  The short answer is, “You can!” The longer answer is, “You can, but send() and
recv() offer much greater control over your data transmission.”

  Stream sockets are reliable two-way connected communication streams.They will also be error free.

What uses stream sockets?
  All the characters you type need to arrive in the same order you type them

How do stream sockets achieve this high level of data transmission quality?
  They use a protocol TCP makes sure your data arrives sequentially and error-free.

  IP(Internet Protocol) deals primarily with Internet routing and is not generally responsible for data integrity.

  if you send a datagram, it may arrive. It may arrive out of order. If it arrives, the data within the packet will be
error-free.

  Datagram sockets also use IP for routing, but they don't use TCP; they use the UDP

Why are they connectionless?
  You just build a packet, slap an IP header on it with destination information, and send it out. No connection needed.

  the tftp protocol says that for each packet that gets sent, the recipient has to send back a packet that
says, “I got it!” (an “ACK” packet.) If the sender of the original packet gets no reply in, say,
five seconds, he'll re-transmit the packet until he finally gets an ACK. This acknowledgment
procedure is very important when implementing SOCK_DGRAM applications.

  Basically, it says this: a packet is born, the packet is wrapped (“encapsulated”) in a header
by the first protocol (say, the TFTP protocol), then the whole thing (TFTP header included) is encapsulated
again by the next protocol (say, UDP), then again by the next (IP), then again by the final protocol
on the hardware (physical) layer (say, Ethernet).
  When another computer receives the packet, the hardware strips the Ethernet header, the
kernel strips the IP and UDP headers, the TFTP program strips the TFTP header, and it finally
has the data.

Remember this for network class exams:
  Application
  Presentation
  Session
  Transport
  Network
  Data Link
  Physical

A layered model more consistent with Unix might be:
  Application Layer (telnet, ftp, etc.)
  Host-to-Host Transport Layer (TCP, UDP)
  Internet Layer (IP and routing)
  Network Access Layer (Ethernet, ATM, or whatever)

See how much work there is in building a simple packet?
  All you have to do for stream sockets is send() the data out. All you have to do for datagram sockets
is encapsulate the packet in the method of your choosing and sendto() it out. The kernel builds the Transport Layer
and Internet Layer on for you and the hardware does the Network Access Layer.

  there are two byte orderings: most significant byte first, or least significant
byte first. The former is called “Network Byte Order”. Some machines store their numbers
internally in Network Byte Order, some don't.

struct sockaddr {
    unsigned short sa_family; // address family, AF_xxx
    char sa_data[14]; // 14 bytes of protocol address
};
sa_family can be a variety of things, but it'll be AF_INET for everything we do in this
document.
sa_data contains a destination address and port number for the socket.

struct sockaddr_in {
    short int sin_family; // Address family
    unsigned short int sin_port; // Port number
    struct in_addr sin_addr; // Internet address
    unsigned char sin_zero[8]; // Same size as struct sockaddr
};
sin_zero (which is included to pad the structure to the length of a struct sockaddr) should
be set to all zeros with the function memset().
sin_family corresponds to sa_family in a struct sockaddr and should be set to “AF_INET”.
Finally, the sin_port and sin_addr must be in Network Byte Order!

// Internet address (a structure for historical reasons)
struct in_addr {
    uint32_t s_addr; // that's a 32-bit int (4 bytes)
};

  So if you have declared ina to be of type struct sockaddr_in, then ina.sin_addr.s_addr
references the 4-byte IP address (in Network Byte Order).

You can use every combination of “n”, “h”, “s”, and “l” you want, not counting the really
stupid ones.
    htons()    host to network short
    htonl()    host to network long
    ntohs()    network to host short
    ntohl()    network to host long

Remember: put your bytes in Network Byte Order before you put them on the network.

why do sin_addr and sin_port need to be in Network Byte Order in a
struct sockaddr_in, but sin_family does not?
  The answer: sin_addr and sin_port get encapsulated in the packet at the IP and UDP layers, respectively.
Thus, they must be in Network Byte Order. However, the sin_family field is only used by the kernel to determine
what type of address the structure contains, so it must be in Host Byte Order. Also, since
sin_family does not get sent out on the network, it can be in Host Byte Order.

    ina.sin_addr.s_addr = inet_addr("10.12.110.57");
Now, the above code snippet isn't very robust because there is no error checking.

    struct sockaddr_in my_addr;
    my_addr.sin_family = AF_INET; // host byte order
    my_addr.sin_port = htons(MYPORT); // short, network byte order
    inet_aton("10.12.110.57", &(my_addr.sin_addr));
    memset(my_addr.sin_zero, '/0', sizeof my_addr.sin_zero);
  Unfortunately, not all platforms implement inet_aton() so, although its use is preferred,
the older more common inet_addr() is used in this guide.

    printf("%s", inet_ntoa(ina.sin_addr));
  Note that inet_ntoa() takes a struct in_addr as an
argument, not a long. Also notice that it returns a pointer to a char. This points to a statically
stored char array within inet_ntoa() so that each time you call inet_ntoa() it will
overwrite the last IP address you asked for.

  Lots of places have a firewall that hides the network from the rest of the world for their
own protection. And often times, the firewall translates “internal” IP addresses to “external”
(that everyone else in the world knows) IP addresses using a process called Network Address
Translation, or NAT.

  The details of which private network numbers are available for you to use are outlined in RFC 1918 10,
but some common ones you'll see are 10.x.x.x and 192.168.x.x, where x is 0-255, generally. Less common is
172.y.x.x, where y goes between 16 and 31.

socket()—Get the File Descriptor!

int socket(int domain, int type, int protocol);
First, domain should be set to “PF_INET”.
Next, the type argument tells the kernel what kind of socket this is: SOCK_STREAM or SOCK_DGRAM.
Finally, just set protocol to “0” to have socket() choose the correct protocol based on the type.

  the most correct thing to do is to use AF_INET in your struct sockaddr_in and PF_INET in your call to socket().

bind()—What port am I on?

  The port number is used by the kernel to match an incoming packet to a certain process's socket descriptor. If
you're going to only be doing a connect(), this may be unnecessary.

int bind(int sockfd, struct sockaddr *my_addr, int addrlen);
sockfd is the socket file descriptor returned by socket().
my_addr is a pointer to a struct sockaddr that contains information about your address, namely, port and IP address.
addrlen can be set to sizeof(struct sockaddr).

    my_addr.sin_port = htons(0); // choose an unused port at random
    my_addr.sin_addr.s_addr = htonl(INADDR_ANY); // use my IP address

  All ports below 1024 are RESERVED (unless you're the superuser)! You can have
any port number above that, right up to 65535 (provided they aren't already being used by
another program.)

“Address already in use.”
  You can either wait for it to clear (a minute or so), or add code to your program allowing it to
reuse the port, like this:
    int yes=1;
    //char yes='1'; // Solaris people use this
    // lose the pesky "Address already in use" error message
    if (setsockopt(listener,SOL_SOCKET,SO_REUSEADDR,&yes,sizeof(int)) == -1) {
    perror("setsockopt");
    exit(1);
    }

  connect(), it'll check to see if the socket is unbound, and will bind() it to an unused local
port if necessary.

connect()—Hey, you!

int connect(int sockfd, struct sockaddr *serv_addr, int addrlen);
sockfd is our friendly neighborhood socket file descriptor, as returned by the socket() call,
serv_addr is a struct sockaddr containing the destination port and IP address, and
addrlen can be set to sizeof(struct sockaddr).

listen()—Will somebody please call me?

int listen(int sockfd, int backlog);
sockfd is the usual socket file descriptor from the socket() system call.
backlog is the number of connections allowed on the incoming queue.
  What does that mean? Well, incoming connections are going to wait in this queue until you accept()
them (see below) and this is the limit on how many can queue up. Most systems silently limit this number
to about 20; you can probably get away with setting it to 5 or 10.

accept()—“Thank you for calling port 3490.”
int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen);
sockfd is the listen()ing socket descriptor.
addr will usually be a pointer to a local struct sockaddr_in. This is where the information about the incoming
connection will go (and with it you can determine which host is calling you from which port).
addrlen is a local integer variable that should be set to sizeof(struct sockaddr_in)
before its address is passed to accept(). Accept will not put more than that many bytes into
addr. If it puts fewer in, it'll change the value of addrlen to reflect that.

send() and recv()—Talk to me, baby!
int send(int sockfd, const void *msg, int len, int flags);
sockfd is the socket descriptor you want to send data to .
msg is a pointer to the data you want to send,
len is the length of that data in bytes. Just set flags to 0.

  if the value returned by send() doesn't match the value in len, it's up to you to send the rest of the
string.

int recv(int sockfd, void *buf, int len, unsigned int flags);
sockfd is the socket descriptor to read from,
buf is the buffer to read the information into,
len is the maximum length of the buffer,
and flags can again be set to 0.
 
  recv() can return 0. This can mean only one thing: the remote side has closed the
connection on you!

sendto() and recvfrom()—Talk to me, DGRAM-style
int sendto(int sockfd, const void *msg, int len, unsigned int flags,
           const struct sockaddr *to, socklen_t tolen);
to is a pointer to a struct sockaddr which contains the destination IP address and port.
tolen, an int deep-down, can simply be set to sizeof(struct sockaddr).

int recvfrom(int sockfd, void *buf, int len, unsigned int flags,
             struct sockaddr *from, int *fromlen);
from is a pointer to a local struct sockaddr that will be filled with the IP address and port of the originating
machine.
fromlen is a pointer to a local int that should be initialized to sizeof(struct sockaddr).
When the function returns, fromlen will contain the length of the address actually stored in from.

  if you connect() a datagram socket, you can then simply use send() and
recv() for all your transactions. The socket itself is still a datagram socket and the packets still
use UDP, but the socket interface will automatically add the destination and source information
for you.

close() and shutdown()—Get outta my face!
close(sockfd);
int shutdown(int sockfd, int how);
sockfd is the socket file descriptor you want to shutdown, and how is one of the
following:
    0 Further receives are disallowed
    1 Further sends are disallowed
    2 Further sends and receives are disallowed (like close())

  It's important to note that shutdown() doesn't actually close the file descriptor—it just
changes its usability. To free a socket descriptor, you need to use close().

getpeername()—Who are you?
  The function getpeername() will tell you who is at the other end of a connected stream
socket.

int getpeername(int sockfd, struct sockaddr *addr, int *addrlen);
sockfd is the descriptor of the connected stream socket,
addr is a pointer to a struct sockaddr (or a struct sockaddr_in) that will hold the information about the other
side of the connection,
and addrlen is a pointer to an int, that should be initialized to sizeof(struct sockaddr).

gethostname()—Who am I?

int gethostname(char *hostname, size_t size);
hostname is a pointer to an array of chars that will contain the hostname upon the function's return,
and size is the length in bytes of the hostname array.

struct hostent *gethostbyname(const char *name);
struct hostent {
    char *h_name;
    char **h_aliases;
    int h_addrtype;
    int h_length;
    char **h_addr_list;
};
#define h_addr h_addr_list[0]
    h_name        Official name of the host.
    h_aliases     A NULL-terminated array of alternate names for the host.
    h_addrtype    The type of address being returned; usually AF_INET.
    h_length      The length of the address in bytes.

    printf("Host name : %s/n", h->h_name);
    printf("IP Address : %s/n", inet_ntoa(*((struct in_addr *)h->h_addr)));

  With gethostbyname(), you can't use perror() to print error message (since errno is
not used). Instead, call herror().

  The basic routine is: server will wait for a connection, accept() it, and
fork() a child process to handle it.

Blocking
  “block” is techie jargon for “sleep”.

  If you don't want a socket to be blocking, you have to make a call to fcntl():
    sockfd = socket(PF_INET, SOCK_STREAM, 0);
    fcntl(sockfd, F_SETFL, O_NONBLOCK);

select()—Synchronous I/O Multiplexing
  select() gives you the power to monitor several sockets at the same time. It'll tell you
which ones are ready for reading, which are ready for writing, and which sockets have raised
exceptions, if you really want to know that.

int select(int numfds, fd_set *readfds, fd_set *writefds,
           fd_set *exceptfds, struct timeval *timeout);
numfds should be set to the values of the highest file descriptor plus one.

    FD_SET(int fd, fd_set *set); Add fd to the set.
    FD_CLR(int fd, fd_set *set); Remove fd from the set.
    FD_ISSET(int fd, fd_set *set); Return true if fd is in the set.
    FD_ZERO(fd_set *set); Clear all entries from the set.

struct timeval {
    int tv_sec; // seconds
    int tv_usec; // microseconds
};

  If you set the fields in your struct timeval to 0, select()
will timeout immediately, effectively polling all the file descriptors in your sets. If you set the
parameter timeout to NULL, it will never timeout, and will wait until the first file descriptor is
ready.

  Some Unices update the time in your struct timeval to reflect the amount of time still
remaining before a timeout. But others do not.

What happens if a socket in the read set closes the connection?
  Well, in that case, select() returns with that socket descriptor set as “ready to read”. When you actually
do recv() from it, recv() will return 0.

  if you have a socket that is listen()ing, you can check to see if there is a new connection
by putting that socket's file descriptor in the readfds set.

Handling Partial send()s
int sendall(int s, char *buf, int *len)
{
    int total = 0; // how many bytes we've sent
    int bytesleft = *len; // how many we have left to send
    int n;
    while(total < *len) {
        n = send(s, buf+total, bytesleft, 0);
        if (n == -1) { break; }
        total += n;
        bytesleft -= n;
    }
    *len = total; // return number actually sent here
    return n==-1?-1:0; // return -1 on failure, 0 on success
}

Serialization—How to Pack Data
send some “binary” data
  1. Convert the number into text with a function like sprintf(), then send the text. The
receiver will parse the text back into a number using a function like strtol().
  2. Just send the data raw, passing a pointer to the data to send().
  3. Encode the number into a portable binary form. The receiver will decode it.

  The first method, encoding the numbers as text before sending, has the advantage that you
can easily print and read the data that's coming over the wire.
  However, it has the disadvantage that it is slow to convert, and the results almost
always take up more space than the original number!

  Method two: passing the raw data. This one is quite easy (but dangerous!): just take a
pointer to the data to send, and call send with it.
  it turns out that not all architectures represent a double (or int) for that matter with
the same bit representation or even the same byte ordering!The code is decidedly non-portable.

  The thing to do is to pack the data into a known format and send that over the wire for
decoding.
uint32_t htonf(float f)
{
    uint32_t p;
    uint32_t sign;
    if (f < 0) { sign = 1; f = -f; }
    else { sign = 0; }
    p = ((((uint32_t)f)&0x7fff)<<16) | (sign<<31); // whole part and sign
    p |= (uint32_t)(((f - (int)f) * 65536.0f))&0xffff; // fraction
    return p;
}

float ntohf(uint32_t p)
{
    float f = ((p>>16)&0x7fff); // whole part
    f += (p&0xffff) / 65536.0f; // fraction
    if (((p>>31)&0x1) == 0x1) { f = -f; } // sign bit set
    return f;
}
  The above code is sort of a naive implementation that stores a float in a 32-bit number.
The high bit (31) is used to store the sign of the number (“1” means negative), and the next
seven bits (30-16) are used to store the whole number portion of the float. Finally, the
remaining bits (15-0) are used to store the fractional portion of the number.

  The Standard for storing floating point numbers is known as IEEE-754 21

  the best way to send the struct over the wire is to pack each field
independently and then unpack them into the struct when they arrive on the other side.

  What does it really mean to encapsulate data, anyway? In the simplest case, it means you'll
stick a header on there with either some identifying information or a packet length, or both.

send data to multiple hosts at the same time!
  With UDP (only UDP, not TCP) and standard IPv4, this is done through a mechanism
called broadcasting.

  You have to set the socket option SO_BROADCAST before you can send a broadcast packet out on the network.

how do you specify the destination address for a broadcast message?
There are two common ways.
1. Send the data to your broadcast address. This is your network number with all
one-bits set for the host portion of the address. For instance, at home my network
is 192.168.1.0, my netmask is 255.255.255.0, so the last byte of the address is my
host number (because the first three bytes, according to the netmask, are the network
number). So my broadcast address is 192.168.1.255.
2. Send the data to the “global” broadcast address. This is 255.255.255.255, aka
INADDR_BROADCAST. Many machines will automatically bitwise AND this with
your network number to convert it to a network broadcast address, but some won't. It
varies.

So what happens if you try to send data on the broadcast address without first setting the
SO_BROADCAST socket option?
  In fact, that's the only difference between a UDP application that can broadcast and one that
can't.

  be careful with broadcast packets. Since every machine on the LAN will be forced
to deal with the packet whether it recvfrom()s it or not, it can present quite a load to the entire
computing network.

How can I view the routing table?
  Run the route command (in /sbin on most Linuxes) or the command netstat -r.

How can I tell if the remote side has closed connection?
  You can tell because recv() will return 0.

Why does select() keep falling out on a signal?
select_restart:
if ((err = select(fdmax+1, &readfds, NULL, NULL, NULL)) == -1) {
    if (errno == EINTR) {
        // some signal just interrupted us, so restart
        goto select_restart;
    }
    // handle the real error here:
    perror("select");
}

I'm sending a slew of data, but when I recv(), it only receives 536 bytes or 1460 bytes at
a time. But if I run it on my local machine, it receives all the data at the same time. What's
going on?
  You're hitting the MTU—the maximum size the physical medium can handle. On the local
machine, you're using the loopback device which can handle 8K or more no problem. But on
Ethernet, which can only handle 1500 bytes with a header, you hit that limit. Over a modem,
with 576 MTU (again, with header), you hit the even lower limit.

你可能感兴趣的:(struct,读书,NetWork,internet,Descriptor,Sockets)