Tuesday, March 31, 2009

Yahoo development session notes: YQL, Flickr, OAuth, YAP

A few notes regarding Yahoo session:
YQL - very nice API, successor of the Yahoo pipes. With you could read and parse user social data, parse RSS or Twiter feeds. Feature that I most liked is possibility to parse recent multiple RSS feeds and extract information you need with a simple query like this:

select title from rss where url="http://rss.news.yahoo.com/rss/topstories" and title like '%GM%'
Currently it's limited up to 10000 requests a day from a single IP address.

OAth - yahoo platform for authentication, standard that all big web guys( Facebook, Google, etc. ) are going to adopt. Very nice feature for web sites that don't want to 'outsource' authentication.

Sunday, March 29, 2009

SMTP properties

Name Type Description
mail.smtp.user String Default user name for SMTP.
mail.smtp.host String The SMTP server to connect to.
mail.smtp.port int The SMTP server port to connect to, if the connect() method doesn't explicitly specify one. Defaults to 25.
mail.smtp.connectiontimeout int Socket connection timeout value in milliseconds. Default is infinite timeout.
mail.smtp.timeout int Socket I/O timeout value in milliseconds. Default is infinite timeout.
mail.smtp.from String Email address to use for SMTP MAIL command. This sets the envelope return address. Defaults to msg.getFrom() or InternetAddress.getLocalAddress(). NOTE: mail.smtp.user was previously used for this.
mail.smtp.localhost String Local host name used in the SMTP HELO or EHLO command. Defaults to InetAddress.getLocalHost().getHostName(). Should not normally need to be set if your JDK and your name service are configured properly.
mail.smtp.localaddress String Local address (host name) to bind to when creating the SMTP socket. Defaults to the address picked by the Socket class. Should not normally need to be set, but useful with multi-homed hosts where it's important to pick a particular local address to bind to.
mail.smtp.localport int Local port number to bind to when creating the SMTP socket. Defaults to the port number picked by the Socket class.
mail.smtp.ehlo boolean If false, do not attempt to sign on with the EHLO command. Defaults to true. Normally failure of the EHLO command will fallback to the HELO command; this property exists only for servers that don't fail EHLO properly or don't implement EHLO properly.
mail.smtp.auth boolean If true, attempt to authenticate the user using the AUTH command. Defaults to false.
mail.smtp.auth.mechanisms String If set, lists the authentication mechanisms to consider, and the order in which to consider them. Only mechanisms supported by the server and supported by the current implementation will be used. The default is "LOGIN PLAIN DIGEST-MD5", which includes all the authentication mechanisms supported by the current implementation.
mail.smtp.submitter String The submitter to use in the AUTH tag in the MAIL FROM command. Typically used by a mail relay to pass along information about the original submitter of the message. See also the setSubmitter method of SMTPMessage. Mail clients typically do not use this.
mail.smtp.dsn.notify String The NOTIFY option to the RCPT command. Either NEVER, or some combination of SUCCESS, FAILURE, and DELAY (separated by commas).
mail.smtp.dsn.ret String The RET option to the MAIL command. Either FULL or HDRS.
mail.smtp.allow8bitmime boolean If set to true, and the server supports the 8BITMIME extension, text parts of messages that use the "quoted-printable" or "base64" encodings are converted to use "8bit" encoding if they follow the RFC2045 rules for 8bit text.
mail.smtp.sendpartial boolean If set to true, and a message has some valid and some invalid addresses, send the message anyway, reporting the partial failure with a SendFailedException. If set to false (the default), the message is not sent to any of the recipients if there is an invalid recipient address.
mail.smtp.sasl.realm String The realm to use with DIGEST-MD5 authentication.
mail.smtp.quitwait boolean If set to false, the QUIT command is sent and the connection is immediately closed. If set to true (the default), causes the transport to wait for the response to the QUIT command.
mail.smtp.reportsuccess boolean If set to true, causes the transport to include an SMTPAddressSucceededException for each address that is successful. Note also that this will cause a SendFailedException to be thrown from the sendMessage method of SMTPTransport even if all addresses were correct and the message was sent successfully.
mail.smtp.socketFactory SocketFactory If set to a class that implements the javax.net.SocketFactory interface, this class will be used to create SMTP sockets. Note that this is an instance of a class, not a name, and must be set using the put method, not the setProperty method.
mail.smtp.socketFactory.class String If set, specifies the name of a class that implements the javax.net.SocketFactory interface. This class will be used to create SMTP sockets.
mail.smtp.socketFactory.fallback boolean If set to true, failure to create a socket using the specified socket factory class will cause the socket to be created using the java.net.Socket class. Defaults to true.
mail.smtp.socketFactory.port int Specifies the port to connect to when using the specified socket factory. If not set, the default port will be used.
mail.smtp.ssl.enable boolean If set to true, use SSL to connect and use the SSL port by default. Defaults to false for the "smtp" protocol and true for the "smtps" protocol.
mail.smtp.ssl.checkserveridentity boolean If set to true, check the server identity as specified by RFC 2595. These additional checks based on the content of the server's certificate are intended to prevent man-in-the-middle attacks. Defaults to false.
mail.smtp.ssl.socketFactory SSLSocketFactory If set to a class that extends the javax.net.ssl.SSLSocketFactory class, this class will be used to create SMTP SSL sockets. Note that this is an instance of a class, not a name, and must be set using the put method, not the setProperty method.
mail.smtp.ssl.socketFactory.class String If set, specifies the name of a class that extends the javax.net.ssl.SSLSocketFactory class. This class will be used to create SMTP SSL sockets.
mail.smtp.ssl.socketFactory.port int Specifies the port to connect to when using the specified socket factory. If not set, the default port will be used.
mail.smtp.ssl.protocols string Specifies the SSL protocols that will be enabled for SSL connections. The property value is a whitespace separated list of tokens acceptable to the javax.net.ssl.SSLSocket.setEnabledProtocols method.
mail.smtp.ssl.ciphersuites string Specifies the SSL cipher suites that will be enabled for SSL connections. The property value is a whitespace separated list of tokens acceptable to the javax.net.ssl.SSLSocket.setEnabledCipherSuites method.
mail.smtp.mailextension String Extension string to append to the MAIL command. The extension string can be used to specify standard SMTP service extensions as well as vendor-specific extensions. Typically the application should use the SMTPTransport method supportsExtension to verify that the server supports the desired service extension. See RFC 1869 and other RFCs that define specific extensions.
mail.smtp.starttls.enable boolean If true, enables the use of the STARTTLS command (if supported by the server) to switch the connection to a TLS-protected connection before issuing any login commands. Note that an appropriate trust store must configured so that the client will trust the server's certificate. Defaults to false.
mail.smtp.starttls.required boolean If true, requires the use of the STARTTLS command. If the server doesn't support the STARTTLS command, or the command fails, the connect method will fail. Defaults to false.
mail.smtp.userset boolean If set to true, use the RSET command instead of the NOOP command in the isConnected method. In some cases sendmail will respond slowly after many NOOP commands; use of RSET avoids this sendmail issue. Defaults to false.

Thursday, March 19, 2009

Disk-Based Parallel Computation, Rubik's Cube, and Checkpointing

Nice one.

The Promise and Peril of Jumbo Frames


The Promise and Peril of Jumbo Frames


via Coding Horror by Jeff Atwood on 3/1/09

We sit at the intersection of two trends:

  1. Most home networking gear, including routers, has safely transitioned to gigabit ethernet.
  2. The generation, storage, and transmission of large high definition video files is becoming commonplace.

If that sounds like you, or someone you know, there's one tweak you should know about that can potentally improve your local network throughput quite a bit -- enabling Jumbo Frames.

The typical UDP packet looks something like this:

udp packet diagram

But the default size of that data payload was established years ago. In the context of gigabit ethernet and the amount of data we transfer today, it does seem a bit.. anemic.

The original 1,518-byte MTU for Ethernet was chosen because of the high error rates and low speed of communications. If a corrupted packet is sent, only 1,518 bytes must be re-sent to correct the error. However, each frame requires that the network hardware and software process it. If the frame size is increased, the same amount of data can be transferred with less effort. This reduces CPU utilization (mostly due to interrupt reduction) and increases throughput by allowing the system to concentrate on the data in the frames, instead of the frames around the data.

I use my beloved energy efficient home theater PC as an always-on media server, and I'm constantly transferring gigabytes of video, music, and photos to it. Let's try enabling jumbo frames for my little network.

The first thing you'll need to do is update your network hardware drivers to the latest versions. I learned this the hard way, but if you want to play with advanced networking features like Jumbo Frames, you need the latest and greatest network hardware drivers. What was included with the OS is unlikely to cut it. Check on the network chipset manufacturer's website.

Once you've got those drivers up to date, look for the Jumbo Frames setting in the advanced properties of the network card. Here's what it looks like on two different ethernet chipsets:

gigabit jumbo marvell yukon advanced settings gigabit jumbo realtek advanced settings

That's my computer, and the HTPC, respectively. I was a little disturbed to notice that neither driver recognizes exactly the same data payload size. It's named "Jumbo Frame" with 2KB - 9KB settings in 1KB increments on the Realtek, and "Jumbo Packet" with 4088 or 9014 settings on the Marvell. I know that technically, for jumbo frames to work, all the networking devices on the subnet have to agree on the data payload size. I couldn't tell quite what to do, so I set them as you see above.

(I didn't change anything on my router / switch, which at the moment is the D-Link DGL-4500; note that most gigabit switches support jumbo frames, but you should always verify with the manufacturer's website to be sure.)

I then ran a few tests to see if there was any difference. I started with a simple file copy.

Default network settings

gigabit jumbo frames disabled file copy results

Jumbo Frames enabled

gigabit jumbo frames enabled file copy results

My file copy went from 47.6 MB/sec to 60.0 MB/sec. Not too shabby! But this is a very ad hoc sort of testing. Let's see what the PassMark Network Benchmark has to say.

Default network settings

gigabit jumbo frames disabled, throughput graph

Jumbo Frames enabled

gigabit jumbo frames enabled, throughput graph

This confirms what I saw with the file copy. With jumbo frames enabled, we go from 390,638 kilobits/sec to 477,927 kilobits/sec average. A solid 20% improvement.

Now, jumbo frames aren't a silver bullet. There's a reason jumbo frames are never enabled by default: some networking equipment can't deal with the non-standard frame sizes. Like all deviations from default settings, it is absolutely possible to make your networking worse by enabling jumbo frames, so proceed with caution. This SmallNetBuilder article outlines some of the pitfalls:

1) For a large frame to be transmitted intact from end to end, every component on the path must support that frame size.

The switch(es), router(s), and NIC(s) from one end to the other must all support the same size of jumbo frame transmission for a successful jumbo frame communication session.

2) Switches that don't support jumbo frames will drop jumbo frames.

In the event that both ends agree to jumbo frame transmission, there still needs to be end-to-end support for jumbo frames, meaning all the switches and routers must be jumbo frame enabled. At Layer 2, not all gigabit switches support jumbo frames. Those that do will forward the jumbo frames. Those that don't will drop the frames.

3) For a jumbo packet to pass through a router, both the ingress and egress interfaces must support the larger packet size. Otherwise, the packets will be dropped or fragmented.

If the size of the data payload can't be negotiated (this is known as PMTUD, packet MTU discovery) due to firewalls, the data will be dropped with no warning, or "blackholed". And if the MTU isn't supported, the data will have to be fragmented to a supported size and retransmitted, reducing throughput.

In addition to these issues, large packets can also hurt latency for gaming and voice-over-IP applications. Bigger isn't always better.

Still, if you regularly transfer large files, jumbo frames are definitely worth looking into. My tests showed a solid 20% gain in throughput, and for the type of activity on my little network, I can't think of any downside.

Tuesday, March 3, 2009

System calls analysys tool

strace -cp 18875

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 77.61    0.019695           0    782458       read
 12.31    0.003123           0      9456        write
  3.60    0.000913           0     47291        fcntl64
  3.32    0.000843           0      9463         open
  1.34    0.000341           0     18920        close
  1.27    0.000323           0     18914        dup2
  0.55    0.000140           0      9457         rt_sigprocmask
  0.00    0.000000           0         6           fstat64
  0.00    0.000000           0        12          getdents64
------ ----------- ----------- --------- --------- ----------------
100.00    0.025378                895977           total


Numbers Everyone Should Know

Google AppEngine Numbers

This group of numbers is from Brett Slatkin in Building Scalable Web Apps with Google App Engine.

Writes are expensive!

  • Datastore is transactional: writes require disk access
  • Disk access means disk seeks
  • Rule of thumb: 10ms for a disk seek
  • Simple math: 1s / 10ms = 100 seeks/sec maximum
  • Depends on:
    * The size and shape of your data
    * Doing work in batches (batch puts and gets)

    Reads are cheap!

  • Reads do not need to be transactional, just consistent
  • Data is read from disk once, then it's easily cached
  • All subsequent reads come straight from memory
  • Rule of thumb: 250usec for 1MB of data from memory
  • Simple math: 1s / 250usec = 4GB/sec maximum
    * For a 1MB entity, that's 4000 fetches/sec

    Numbers Miscellaneous

    This group of numbers is from a presentation Jeff Dean gave at a Engineering All-Hands Meeting at Google.

  • L1 cache reference 0.5 ns
  • Branch mispredict 5 ns
  • L2 cache reference 7 ns
  • Mutex lock/unlock 100 ns
  • Main memory reference 100 ns
  • Compress 1K bytes with Zippy 10,000 ns
  • Send 2K bytes over 1 Gbps network 20,000 ns
  • Read 1 MB sequentially from memory 250,000 ns
  • Round trip within same datacenter 500,000 ns
  • Disk seek 10,000,000 ns
  • Read 1 MB sequentially from network 10,000,000 ns
  • Read 1 MB sequentially from disk 30,000,000 ns
  • Send packet CA->Netherlands->CA 150,000,000 ns

    The Lessons

  • Writes are 40 times more expensive than reads.
  • Global shared data is expensive. This is a fundamental limitation of distributed systems. The lock contention in shared heavily written objects kills performance as transactions become serialized and slow.
  • Architect for scaling writes.
  • Optimize for low write contention.
  • Optimize wide. Make writes as parallel as you can.


  • Web 2.0 Tools For Project Management