﻿<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>奋斗足迹&#124;崔玉松 &#187; 崔玉松</title>
	<atom:link href="http://fendou.org/author/admin/feed/" rel="self" type="application/rss+xml" />
	<link>http://fendou.org</link>
	<description>为家人，为自己，为生活~~</description>
	<lastBuildDate>Fri, 24 Feb 2012 15:21:42 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	
		<item>
		<title>淘宝Fourinone介绍及与Hadoop的性能PK</title>
		<link>http://fendou.org/2012/02/24/taobao-fourinone-vs-hadoop/</link>
		<comments>http://fendou.org/2012/02/24/taobao-fourinone-vs-hadoop/#comments</comments>
		<pubDate>Fri, 24 Feb 2012 15:19:58 +0000</pubDate>
		<dc:creator>崔玉松</dc:creator>
				<category><![CDATA[Study & Reading]]></category>
		<category><![CDATA[Hadoop]]></category>
		<category><![CDATA[编程技术]]></category>

		<guid isPermaLink="false">http://fendou.org/?p=871</guid>
		<description><![CDATA[FourInOne（中文名字“四不像”）是一个四合一分布式计算框架，在写这个框架之前，我对分布式计算进行了长时间的思考，也看了老外写的其他开源框架,当我们把复杂的hadoop当作一门学科学习时，似乎忘记了我们想解决问题的初衷：我们仅仅是想写个程序把几台甚至更多的机器一起用起来计算，把更多的cpu和内存利用上，来解决我们数量大和计算复杂的问题，当然这个过程中要考虑到分布式的协同和故障处理。如果仅仅是为了实现这个简单的初衷，为什么一切会那么复杂，我觉的自己可以写一个更简单的东西，它不需要过度设计，只需要看上去更酷一点，更小巧一点，功能更强一点。于是我将自己对分布式的理解融入到这个框架中，考虑到底层实现技术的相似性，我将Hadoop,Zookeeper,MQ,分布式缓存四大主要的分布式计算功能合为一个框架内，对复杂的分布式计算应用进行了大量简化和归纳。 fourinone-1.11.09 hadoop-0.21.0 体积 82K 71M 依赖关系 就一个jar,没有依赖 约12项jar包依赖 配置 就一个配置文件 较多配置文件和复杂属性 集群搭建 简单，每台机器放一个jar和配置文件 复杂，需要linux操作基础和ssh等复杂配置，还需要较多配置文件配置 计算模式 提供两种计算模式：包工头和工人直接交互方式，包工头和工人通过消息中枢方式交互，后者不需要工人节点可直接访问 计算更多倾向于文件数据的并行读取，而非计算过程的设计。JobTracke 跟TaskTracker直接交互， 查询NameNode后，TaskTracker直接从Datanode获取数据。 并行模式 N*N，支持单机并行，也支持多机并行，多机多实例并行 1*N，不支持单机并行，只支持多机单实例并行 内存方式 支持内存方式设计和开发应用，并内置完整的分布式缓存功能 以hdfs文件方式进行数据处理，内存方式计算支持很弱 文件方式 自带文件适配器处理io Hdfs处理文件io 计算数据要求 任意数据格式和任意数据来源，包括来自数据库，分布式文件，分布式缓存等 Hdfs内的文件数据，多倾向于带换行符的数据 调度角色 包工头，可以有多个，支持链式处理，也支持大包工头对小包工头的调度 JobTracke，通常与NameNode一起 任务执行角色 农民工，框架支持设计多种类型的工人用于拆分或者合并任务 TaskTracker，通常与Datanode一起 中间结果数据保存 手工仓库，或者其他任意数据库存储设备 Hdfs中间结果文件 拆分策略 自由设计，框架提供链式处理对于大的业务场景进行环节拆分数据的存储和计算拆分根据业务场景自定义 以64m为拆分进行存储，以行为拆分进行计算 实现map接口，按行处理数据进行计算 合并策略 自由设计，框架提供农民工节点之间的合并接口，可以互相交互设计合并策略，也可以通过包工头进行合并 TaskTracker不透明，较少提供程序控制，合并策略设计复杂 实现reduce接口进行中间数据合并逻辑实现 内存耗用 无需要制定JVM内存，按默认即可，根据计算要求考虑是否增加JVM内存 需要制定JVM内存，每个进程默认1G，常常namenode，jobtracker等启动3个进程，耗用3G内存 监控 框架提供多环节链式处理设计支持监控过程，通过可编程的监控方式，给于业务开发方最大灵活的监控需求实现，为追求高性能不输出大量系统监控log 输出较多的系统监控log，如map和reduce百分比等，但是会牺牲性能，业务监控需要自己实现 打包部署 脚本工具...  <a href="http://fendou.org/2012/02/24/taobao-fourinone-vs-hadoop/" class="more-link" title="Read 淘宝Fourinone介绍及与Hadoop的性能PK">Read more &#187;</a><table class="wumii-related-items" cellspacing="0" cellpadding="2" border="0" width="100%" style="clear: both;">
    
    <tr>
        <td ><b><font size="-1"  style="display: block !important; padding: 20px 0 5px !important;">您可能也喜欢：</font></b></td>
    </tr>
    
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2011%2F06%2F08%2F%25E4%25BA%2594%25E4%25B8%25AA%25E6%259C%2580%25E4%25BD%25B3%25E7%259A%2584hadoop%25E9%25A1%25B9%25E7%259B%25AE%2F&from=http%3A%2F%2Ffendou.org%2F2012%2F02%2F24%2Ftaobao-fourinone-vs-hadoop%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">五个最佳的Hadoop项目</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2011%2F09%2F17%2Ftaobao-project%2F&from=http%3A%2F%2Ffendou.org%2F2012%2F02%2F24%2Ftaobao-fourinone-vs-hadoop%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">淘宝在数据处理领域的项目及开源产品介绍</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2010%2F08%2F18%2Fprogramming-pearls%2F&from=http%3A%2F%2Ffendou.org%2F2012%2F02%2F24%2Ftaobao-fourinone-vs-hadoop%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">代码调优法则--编程珠玑笔记</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2012%2F01%2F17%2Ftf-idf%2F&from=http%3A%2F%2Ffendou.org%2F2012%2F02%2F24%2Ftaobao-fourinone-vs-hadoop%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">TF-IDF及文本相似性度量</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2009%2F06%2F06%2Fdip%2F&from=http%3A%2F%2Ffendou.org%2F2012%2F02%2F24%2Ftaobao-fourinone-vs-hadoop%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">数据完整性策略</font>
                    </a>
                </td>
            </tr>
    
    <tr>
        <td  align="right">
            <a style="text-decoration: none !important;" href="http://www.wumii.com/widget/relatedItems" target="_blank" title="无觅相关文章插件">
                <font size="-1" color="#bbbbbb" style="display: block !important; font-family: arial !important; padding: 5px 0 !important; font-size: 12px !important; color: #bbb !important;">无觅</font>
            </a>
        </td>
    </tr>
</table>]]></description>
			<content:encoded><![CDATA[<p><a href="http://baike.baidu.com/view/6402003.html">FourInOne</a>（中文名字“四不像”）是一个四合一分布式计算框架，在写这个框架之前，我对分布式计算进行了长时间的思考，也看了老外写的其他开源框架,当我们把复杂的hadoop当作一门学科学习时，似乎忘记了我们想解决问题的初衷：我们仅仅是想写个程序把几台甚至更多的机器一起用起来计算，把更多的cpu和内存利用上，来解决我们数量大和计算复杂的问题，当然这个过程中要考虑到分布式的协同和故障处理。如果仅仅是为了实现这个简单的初衷，为什么一切会那么复杂，我觉的自己可以写一个更简单的东西，它不需要过度设计，只需要看上去更酷一点，更小巧一点，功能更强一点。于是我将自己对分布式的理解融入到这个框架中，考虑到底层实现技术的相似性，我将Hadoop,Zookeeper,MQ,分布式缓存四大主要的分布式计算功能合为一个框架内，对复杂的分布式计算应用进行了大量简化和归纳。</p>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td></td>
<td>fourinone-1.11.09</td>
<td>hadoop-0.21.0</td>
</tr>
<tr>
<td>体积</td>
<td>82K</td>
<td>71M</td>
</tr>
<tr>
<td>依赖关系</td>
<td>就一个jar,没有依赖</td>
<td>约12项jar包依赖</td>
</tr>
<tr>
<td>配置</td>
<td>就一个配置文件</td>
<td>较多配置文件和复杂属性</td>
</tr>
<tr>
<td>集群搭建</td>
<td>简单，每台机器放一个jar和配置文件</td>
<td>复杂，需要linux操作基础和ssh等复杂配置，还需要较多配置文件配置</td>
</tr>
<tr>
<td>计算模式</td>
<td>提供两种计算模式：包工头和工人直接交互方式，包工头和工人通过消息中枢方式交互，后者不需要工人节点可直接访问</td>
<td>计算更多倾向于文件数据的并行读取，而非计算过程的设计。JobTracke 跟TaskTracker直接交互， 查询NameNode后，TaskTracker直接从Datanode获取数据。</td>
</tr>
<tr>
<td>并行模式</td>
<td>N*N，支持单机并行，也支持多机并行，多机多实例并行</td>
<td>1*N，不支持单机并行，只支持多机单实例并行</td>
</tr>
<tr>
<td>内存方式</td>
<td>支持内存方式设计和开发应用，并内置完整的分布式缓存功能</td>
<td>以hdfs文件方式进行数据处理，内存方式计算支持很弱</td>
</tr>
<tr>
<td>文件方式</td>
<td>自带文件适配器处理io</td>
<td>Hdfs处理文件io</td>
</tr>
<tr>
<td>计算数据要求</td>
<td>任意数据格式和任意数据来源，包括来自数据库，分布式文件，分布式缓存等</td>
<td>Hdfs内的文件数据，多倾向于带换行符的数据</td>
</tr>
<tr>
<td>调度角色</td>
<td>包工头，可以有多个，支持链式处理，也支持大包工头对小包工头的调度</td>
<td>JobTracke，通常与NameNode一起</td>
</tr>
<tr>
<td>任务执行角色</td>
<td>农民工，框架支持设计多种类型的工人用于拆分或者合并任务</td>
<td>TaskTracker，通常与Datanode一起</td>
</tr>
<tr>
<td>中间结果数据保存</td>
<td>手工仓库，或者其他任意数据库存储设备</td>
<td>Hdfs中间结果文件</td>
</tr>
<tr>
<td>拆分策略</td>
<td>自由设计，框架提供链式处理对于大的业务场景进行环节拆分数据的存储和计算拆分根据业务场景自定义</td>
<td>以64m为拆分进行存储，以行为拆分进行计算</p>
<p>实现map接口，按行处理数据进行计算</td>
</tr>
<tr>
<td>合并策略</td>
<td>自由设计，框架提供农民工节点之间的合并接口，可以互相交互设计合并策略，也可以通过包工头进行合并</td>
<td>TaskTracker不透明，较少提供程序控制，合并策略设计复杂</p>
<p>实现reduce接口进行中间数据合并逻辑实现</td>
</tr>
<tr>
<td>内存耗用</td>
<td>无需要制定JVM内存，按默认即可，根据计算要求考虑是否增加JVM内存</td>
<td>需要制定JVM内存，每个进程默认1G，常常namenode，jobtracker等启动3个进程，耗用3G内存</td>
</tr>
<tr>
<td>监控</td>
<td>框架提供多环节链式处理设计支持监控过程，通过可编程的监控方式，给于业务开发方最大灵活的监控需求实现，为追求高性能不输出大量系统监控log</td>
<td>输出较多的系统监控log，如map和reduce百分比等，但是会牺牲性能，业务监控需要自己实现</td>
</tr>
<tr>
<td>打包部署</td>
<td>脚本工具</td>
<td>上传jar包到jobtracker机器</td>
</tr>
<tr>
<td>平台支撑</td>
<td>支持跨平台，windows支持良好</td>
<td>多倾向于支持linux，Windows支持不佳，需要模拟linux环境，并且建议只用于开发学习</td>
</tr>
<tr>
<td>其他</td>
<td>协同一致性、分布式缓存、通讯队列等跟分布式计算关系密切的功能支持</td>
<td>不支持</td>
</tr>
<tr>
<td>总结：</td>
<td>Hadoop并不是为了追求一个并行计算的框架而设计，提供快捷和灵活的计算方式去服务各种计算场景， 它更多的是一个分布式文件系统，提供文件数据的存储和查询，它的map/reduce更倾向于提供并行计算方式进行文件数据查询。而fourinone相反。</td>
</tr>
</tbody>
</table>
<p>&nbsp;</p>
<p>Fourinone和hadoop运行wordcount的对比测试（平均4核4g配置，输入数据为文件）：</p>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td></td>
<td>fourinone-1.11.09(n*4)</td>
<td>fourinone-1.11.09(n*1)</td>
<td>hadoop-0.21.0(n*1)</td>
</tr>
<tr>
<td>3台机器*256M</td>
<td>4s</td>
<td>12s</td>
<td>72s</td>
</tr>
<tr>
<td>3台机器*512M</td>
<td>7s</td>
<td>30s</td>
<td>140s</td>
</tr>
<tr>
<td>3台机器*1G</td>
<td>14s</td>
<td>50s</td>
<td>279s</td>
</tr>
<tr>
<td>19台机器*1G</td>
<td>21s</td>
<td>60s</td>
<td>289s</td>
</tr>
<tr>
<td>10台机器*2G</td>
<td>29s</td>
<td></td>
<td></td>
</tr>
<tr>
<td>5台机器*4G</td>
<td>60s</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
<p>说明：Fourinone可以充分利用单机并行能力，4核计算机可以4个并行实例计算，hadoop目前只能N*1；另外，可以由上图看出，如果要完成20g的数据，实际上fourinone只需要使用5台机器用60秒完成，比使用19台机器完成19g的hadoop节省了14台机器，并提前了200多秒</p>
<p>相关分布式框架(jar格式)及demo代码下载：</p>
<p><a href="http://download.csdn.net/detail/fourinone/3557912">http://download.csdn.net/detail/fourinone/3557912</a></p>
<p><a href="http://www.skycn.com/soft/68321.html">http://www.skycn.com/soft/68321.html</a></p>
<p>&nbsp;</p>
<blockquote><p>作者介绍：<br />
Stone.Peng<br />
资深IT技术人士<br />
现在淘宝网任高级专家，从事互联网核心技术研究<br />
之前在金蝶总体架构部任SOA架构师，负责设计ESB</p></blockquote>
<table class="wumii-related-items" cellspacing="0" cellpadding="2" border="0" width="100%" style="clear: both;">
    
    <tr>
        <td ><b><font size="-1"  style="display: block !important; padding: 20px 0 5px !important;">您可能也喜欢：</font></b></td>
    </tr>
    
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2011%2F06%2F08%2F%25E4%25BA%2594%25E4%25B8%25AA%25E6%259C%2580%25E4%25BD%25B3%25E7%259A%2584hadoop%25E9%25A1%25B9%25E7%259B%25AE%2F&from=http%3A%2F%2Ffendou.org%2F2012%2F02%2F24%2Ftaobao-fourinone-vs-hadoop%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">五个最佳的Hadoop项目</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2011%2F09%2F17%2Ftaobao-project%2F&from=http%3A%2F%2Ffendou.org%2F2012%2F02%2F24%2Ftaobao-fourinone-vs-hadoop%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">淘宝在数据处理领域的项目及开源产品介绍</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2010%2F08%2F18%2Fprogramming-pearls%2F&from=http%3A%2F%2Ffendou.org%2F2012%2F02%2F24%2Ftaobao-fourinone-vs-hadoop%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">代码调优法则--编程珠玑笔记</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2012%2F01%2F17%2Ftf-idf%2F&from=http%3A%2F%2Ffendou.org%2F2012%2F02%2F24%2Ftaobao-fourinone-vs-hadoop%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">TF-IDF及文本相似性度量</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2009%2F06%2F06%2Fdip%2F&from=http%3A%2F%2Ffendou.org%2F2012%2F02%2F24%2Ftaobao-fourinone-vs-hadoop%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">数据完整性策略</font>
                    </a>
                </td>
            </tr>
    
    <tr>
        <td  align="right">
            <a style="text-decoration: none !important;" href="http://www.wumii.com/widget/relatedItems" target="_blank" title="无觅相关文章插件">
                <font size="-1" color="#bbbbbb" style="display: block !important; font-family: arial !important; padding: 5px 0 !important; font-size: 12px !important; color: #bbb !important;">无觅</font>
            </a>
        </td>
    </tr>
</table>]]></content:encoded>
			<wfw:commentRss>http://fendou.org/2012/02/24/taobao-fourinone-vs-hadoop/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>TF-IDF及文本相似性度量</title>
		<link>http://fendou.org/2012/01/17/tf-idf/</link>
		<comments>http://fendou.org/2012/01/17/tf-idf/#comments</comments>
		<pubDate>Tue, 17 Jan 2012 03:45:57 +0000</pubDate>
		<dc:creator>崔玉松</dc:creator>
				<category><![CDATA[Excellence Article]]></category>
		<category><![CDATA[算法]]></category>
		<category><![CDATA[编程技术]]></category>

		<guid isPermaLink="false">http://fendou.org/?p=867</guid>
		<description><![CDATA[TF-IDF（term frequency–inverse document frequency）是一种用于资讯检索与文本挖掘的常用加权技术。TF-IDF是一种统计方法，用以评估一字词对于一个文件集或一个语料库中的其中一份文件的重要程度。字词的重要性随着它在文件中出现的次数成正比增加，但同时会随着它在语料库中出现的频率成反比下降。TF-IDF加权的各种形式常被搜索引擎应用，作为文件与用户查询之间相关程度的度量或评级。除了TF-IDF以外，互联网上的搜寻引擎还会使用基 于连结分析的评级方法，以确定文件在搜寻结果中出现的顺序。 在一份给定的文件里，词频（term frequency，TF）指的是某一个给定的词语在该文件中出现的次数。这个数字通常会被正规化，以防止它偏向长的文件。（同一个词语在长文件里可能会 比短文件有更高的词频，而不管该词语重要与否。）对于在某一特定文件里的词语 ti 来说，它的重要性可表示为： 以上式子中 ni,j 是该词在文件dj中的出现次数，而分 母则是在文件dj中所有字词的出现次数 之和。 逆向文件频率（inverse document frequency，IDF）是一个词语普遍重要性的度量。某一特定词语的IDF，可以由总文件数目除以包含该词语之文件的数目，再将得到的商取对数得到： 其中 &#124;D&#124;：语料库中的文件总数 ： 包含词语ti的文件数目（即的 文件数目） 然后 某一特定文件内的高词语频率，以及该词语在整个文件集合中的低文件频率，可以产生出高权重的TF-IDF。因此，TF-IDF倾向于过滤掉常见的词 语，保留重要的词语。 =================文本相似性度量======================= 方法一：向量空间模型 在向量空间模型中，文本泛指各种机器可读的记录。用D（Document）表示，特征项（Term，用t表示）是指出现在文档D中且能够代表该文档内容的 基本语言单位，主要是由词或者短语构成，文本可以用特征项集表示为D(T1，T2，…，Tn)，其中Tk是特征项，1&#60;=k&#60;=N。例如一篇 文档中有a、b、c、d四个特征项，那么这篇文档就可以表示为D(a，b，c，d)。对含有n个特征项的文本而言，通常会给每个特征项赋予一定的权重表示 其重要程度。即D＝D(T1，W1；T2，W2；…，Tn，Wn)，简记为D＝D(W1，W2，…，Wn)，我们把它叫做文本D的向量表示。其中Wk是 Tk的权重，1&#60;=k&#60;=N。在上面那个例子中，假设a、b、c、d的权重分别为30，20，20，10，那么该文本的向量表示为 D(30，20，20，10)。在向量空间模型中，两个文本D1和D2之间的内容相关度Sim(D1，D2)常用向量之间夹角的余弦值表示，公式为： 其 中，W1k、W2k分别表示文本D1和D2第K个特征项的权值，1&#60;=k&#60;=N。 在自动归类中，我们可以利用类似的方法来计算待归类 文档和某类目的相关度。例如文本D1的特征项为a，b，c，d，权值分别为30，20，20，10，类目C1的特征项为a，c，d，e，权值分别为 40，30，20，10，则D1的向量表示为D1(30,20,20,10,0),C1的向量表示为C1（40，0，30，20，10），则根据上式计算 出来的文本D1与类目C1相关度是0.86 方法二：字符串相似度 对于象字符串计算相似度的算法有很多，常用的有最大公共字串，编辑距离等。 编辑距离就是用来计算从原串（s）转换到目标串(t)所需要的最少的插入，删除和替换的数目，在NLP中应用比较广泛，如一些评测方法中就用到了 （wer,mWer等），同时也常用来计算你对原文本所作的改动数。编辑距离的算法是首先由俄国科学家Levenshtein提出的，故又叫 Levenshtein Distance。<table class="wumii-related-items" cellspacing="0" cellpadding="2" border="0" width="100%" style="clear: both;">
    
    <tr>
        <td ><b><font size="-1"  style="display: block !important; padding: 20px 0 5px !important;">您可能也喜欢：</font></b></td>
    </tr>
    
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2011%2F09%2F17%2Ftaobao-project%2F&from=http%3A%2F%2Ffendou.org%2F2012%2F01%2F17%2Ftf-idf%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">淘宝在数据处理领域的项目及开源产品介绍</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2010%2F08%2F18%2Fprogramming-pearls%2F&from=http%3A%2F%2Ffendou.org%2F2012%2F01%2F17%2Ftf-idf%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">代码调优法则--编程珠玑笔记</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2009%2F06%2F08%2Fthree-rules%2F&from=http%3A%2F%2Ffendou.org%2F2012%2F01%2F17%2Ftf-idf%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">开发环境的三大规则</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2010%2F01%2F13%2Fdesireable-characteristics-design%2F&from=http%3A%2F%2Ffendou.org%2F2012%2F01%2F17%2Ftf-idf%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">软件构建中的理想设计特征</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2009%2F11%2F16%2F%25E4%25BC%259A%25E8%25AF%259D%25E7%258A%25B6%25E6%2580%2581%25E6%25A8%25A1%25E5%25BC%258F%2F&from=http%3A%2F%2Ffendou.org%2F2012%2F01%2F17%2Ftf-idf%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">会话状态模式</font>
                    </a>
                </td>
            </tr>
    
    <tr>
        <td  align="right">
            <a style="text-decoration: none !important;" href="http://www.wumii.com/widget/relatedItems" target="_blank" title="无觅相关文章插件">
                <font size="-1" color="#bbbbbb" style="display: block !important; font-family: arial !important; padding: 5px 0 !important; font-size: 12px !important; color: #bbb !important;">无觅</font>
            </a>
        </td>
    </tr>
</table>]]></description>
			<content:encoded><![CDATA[<p><strong>TF-IDF</strong>（term frequency–inverse document frequency）是一种用于资讯检索与文本挖掘的常用加权技术。TF-IDF是一种统计方法，用以评估一字词对于一个文件集或一个语料库中的其中一份文件的重要程度。字词的重要性随着它在文件中出现的次数成正比增加，但同时会随着它在语料库中出现的频率成反比下降。TF-IDF加权的各种形式常被搜索引擎应用，作为文件与用户查询之间相关程度的度量或评级。除了TF-IDF以外，互联网上的搜寻引擎还会使用基 于连结分析的评级方法，以确定文件在搜寻结果中出现的顺序。</p>
<p>在一份给定的文件里，<strong>词频</strong>（term frequency，TF）指的是某一个给定的词语在该文件中出现的次数。这个数字通常会被正规化，以防止它偏向长的文件。（同一个词语在长文件里可能会 比短文件有更高的词频，而不管该词语重要与否。）对于在某一特定文件里的词语 <em>t</em><sub><em>i</em></sub> 来说，它的重要性可表示为：</p>
<dl>
<dd><img src="http://upload.wikimedia.org/math/e/5/a/e5a7b43197068eddf42859f3995ebf15.png" alt=" \mathrm{tf_{i,j}} = \frac{n_{i,j}}{\sum_k  n_{k,j}}" /></dd>
</dl>
<p>以上式子中 <em>n</em><sub><em>i</em>,<em>j</em></sub> 是该词在文件<em>d</em><sub><em>j</em></sub>中的出现次数，而分 母则是在文件<em>d</em><sub><em>j</em></sub>中所有字词的出现次数 之和。</p>
<p><strong>逆向文件频率</strong>（inverse document frequency，IDF）是一个词语普遍重要性的度量。某一特定词语的IDF，可以由总文件数目除以包含该词语之文件的数目，再将得到的商取对数得到：</p>
<dl>
<dd><img src="http://upload.wikimedia.org/math/4/8/9/489bb0915a2757f6ffac62a7e14fff0e.png" alt=" \mathrm{idf_i} =  \log \frac{|D|}{|\{d: d \ni  t_{i}\}|}" /></dd>
</dl>
<p>其中</p>
<ul>
<li>|D|：语料库中的文件总数</li>
<li><img src="http://upload.wikimedia.org/math/e/3/7/e3792477cf3b2231dfbb6680fc5e75e2.png" alt=" |\{d :d\ni t_{i}\}| " />： 包含词语<em>t</em><sub><em>i</em></sub>的文件数目（即<img src="http://upload.wikimedia.org/math/3/f/4/3f49d11f43e671a36ff945ac2d13bc20.png" alt=" n_{i} \neq 0" />的 文件数目）</li>
</ul>
<p>然后</p>
<dl>
<dd><img src="http://upload.wikimedia.org/math/2/6/9/26962b563a286ba1de69e8a44f87a8ba.png" alt=" \mathrm{tf{}idf_{i,j}} = \mathrm{tf_{i,j}}  \cdot  \mathrm{idf_{i}} " /></dd>
</dl>
<p>某一特定文件内的高词语频率，以及该词语在整个文件集合中的低文件频率，可以产生出高权重的TF-IDF。因此，TF-IDF倾向于过滤掉常见的词 语，保留重要的词语。</p>
<p>=================文本相似性度量=======================</p>
<p><strong>方法一：向量空间模型</strong></p>
<p>在向量空间模型中，文本泛指各种机器可读的记录。用D（Document）表示，特征项（Term，用t表示）是指出现在文档D中且能够代表该文档内容的 基本语言单位，主要是由词或者短语构成，文本可以用特征项集表示为D(T1，T2，…，Tn)，其中Tk是特征项，1&lt;=k&lt;=N。例如一篇 文档中有a、b、c、d四个特征项，那么这篇文档就可以表示为D(a，b，c，d)。对含有n个特征项的文本而言，通常会给每个特征项赋予一定的权重表示 其重要程度。即D＝D(T1，W1；T2，W2；…，Tn，Wn)，简记为D＝D(W1，W2，…，Wn)，我们把它叫做文本D的向量表示。其中Wk是 Tk的权重，1&lt;=k&lt;=N。在上面那个例子中，假设a、b、c、d的权重分别为30，20，20，10，那么该文本的向量表示为 D(30，20，20，10)。在向量空间模型中，两个文本D1和D2之间的内容相关度Sim(D1，D2)常用向量之间夹角的余弦值表示，公式为：<br />
<img src="http://www.xd-tech.com.cn/blog/attachments/month_0610/n20061011103816.jpg" alt="" /><br />
其 中，W1k、W2k分别表示文本D1和D2第K个特征项的权值，1&lt;=k&lt;=N。<br />
在自动归类中，我们可以利用类似的方法来计算待归类 文档和某类目的相关度。例如文本D1的特征项为a，b，c，d，权值分别为30，20，20，10，类目C1的特征项为a，c，d，e，权值分别为 40，30，20，10，则D1的向量表示为D1(30,20,20,10,0),C1的向量表示为C1（40，0，30，20，10），则根据上式计算 出来的文本D1与类目C1相关度是0.86</p>
<p><strong>方法二：字符串相似度</strong></p>
<p>对于象字符串计算相似度的算法有很多，常用的有最大公共字串，编辑距离等。</p>
<p>编辑距离就是用来计算从原串（s）转换到目标串(t)所需要的最少的插入，删除和替换的数目，在NLP中应用比较广泛，如一些评测方法中就用到了 （wer,mWer等），同时也常用来计算你对原文本所作的改动数。编辑距离的算法是首先由俄国科学家Levenshtein提出的，故又叫 Levenshtein Distance。</p>
<table class="wumii-related-items" cellspacing="0" cellpadding="2" border="0" width="100%" style="clear: both;">
    
    <tr>
        <td ><b><font size="-1"  style="display: block !important; padding: 20px 0 5px !important;">您可能也喜欢：</font></b></td>
    </tr>
    
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2011%2F09%2F17%2Ftaobao-project%2F&from=http%3A%2F%2Ffendou.org%2F2012%2F01%2F17%2Ftf-idf%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">淘宝在数据处理领域的项目及开源产品介绍</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2010%2F08%2F18%2Fprogramming-pearls%2F&from=http%3A%2F%2Ffendou.org%2F2012%2F01%2F17%2Ftf-idf%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">代码调优法则--编程珠玑笔记</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2009%2F06%2F08%2Fthree-rules%2F&from=http%3A%2F%2Ffendou.org%2F2012%2F01%2F17%2Ftf-idf%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">开发环境的三大规则</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2010%2F01%2F13%2Fdesireable-characteristics-design%2F&from=http%3A%2F%2Ffendou.org%2F2012%2F01%2F17%2Ftf-idf%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">软件构建中的理想设计特征</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2009%2F11%2F16%2F%25E4%25BC%259A%25E8%25AF%259D%25E7%258A%25B6%25E6%2580%2581%25E6%25A8%25A1%25E5%25BC%258F%2F&from=http%3A%2F%2Ffendou.org%2F2012%2F01%2F17%2Ftf-idf%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">会话状态模式</font>
                    </a>
                </td>
            </tr>
    
    <tr>
        <td  align="right">
            <a style="text-decoration: none !important;" href="http://www.wumii.com/widget/relatedItems" target="_blank" title="无觅相关文章插件">
                <font size="-1" color="#bbbbbb" style="display: block !important; font-family: arial !important; padding: 5px 0 !important; font-size: 12px !important; color: #bbb !important;">无觅</font>
            </a>
        </td>
    </tr>
</table>]]></content:encoded>
			<wfw:commentRss>http://fendou.org/2012/01/17/tf-idf/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>2011年总结</title>
		<link>http://fendou.org/2012/01/13/2011/</link>
		<comments>http://fendou.org/2012/01/13/2011/#comments</comments>
		<pubDate>Fri, 13 Jan 2012 13:02:39 +0000</pubDate>
		<dc:creator>崔玉松</dc:creator>
				<category><![CDATA[Life Diary]]></category>
		<category><![CDATA[回忆]]></category>
		<category><![CDATA[祝福]]></category>

		<guid isPermaLink="false">http://fendou.org/?p=864</guid>
		<description><![CDATA[忙了一年，终于可以休息下了，2011年对工作上的总结就是“忙”，上半年还好，下半年忙得吐血，每次报销打车票的时候，才看到原来自己两个月里有80%以上的时间都是晚上10店以后回家的，甚至有几次凌晨回家，9点又到公司的记录，真的在玩命。回头看看2011年初的对2010年的总结，发现自己已经职业了很多。 2011年其实没啥总结的，团队的业务还行，年中的时候试图换个岗位尝试一下没玩过的东西，结果没去成，感觉还挺有意思的。 2011年初的几个愿望基本上都没有实现，时间倒是确实不多，但是总体来说，如果够刻苦总是能找些时间的，2011年读了很多书，数据库和搜索引擎的最多，甚至有考数据库认证的打算，书看完了，却没时间去考试。算了，不考也罢。 2011年整体来说，过得还算顺利，没胖也没瘦， 身体还好，没生什么大病，去了一两次医院。 2012看来没有世界末日的迹象，日子还是得过。2012事情很多，要静下心来做一件自己想做而没有做的事情，可能很难，希望能坚持。2012我希望能够将持续集成引入自己的项目中，让自动化代替部分人工，测试的同事太辛苦，也为了让自己睡个安稳觉。2012年依然有值得期待的书籍，读10本书是必须的。2012得换个手机，旧手机一直没坏，换掉觉得浪费。2012要多认识些朋友，都世界末日了，在黄泉路上大家有个伴。2012多活跃一些，多写点博客。2012希望下班后能早点回家。2012希望自己和亲人身体都好，这才是革命的本钱。<table class="wumii-related-items" cellspacing="0" cellpadding="2" border="0" width="100%" style="clear: both;">
    
    <tr>
        <td ><b><font size="-1"  style="display: block !important; padding: 20px 0 5px !important;">您可能也喜欢：</font></b></td>
    </tr>
    
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2011%2F01%2F14%2Fanother-year%2F&from=http%3A%2F%2Ffendou.org%2F2012%2F01%2F13%2F2011%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">又过去了一年</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2009%2F04%2F11%2Fchinaunix%2F&from=http%3A%2F%2Ffendou.org%2F2012%2F01%2F13%2F2011%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">ChinaUnix网络优化论坛上海站总结</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2008%2F10%2F19%2Fstudy_method%2F&from=http%3A%2F%2Ffendou.org%2F2012%2F01%2F13%2F2011%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">个人学习方法总结</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2010%2F01%2F04%2Fsummary-2009%2F&from=http%3A%2F%2Ffendou.org%2F2012%2F01%2F13%2F2011%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">迟到的2009总结</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2007%2F12%2F16%2F80-years%2F&from=http%3A%2F%2Ffendou.org%2F2012%2F01%2F13%2F2011%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">80 years</font>
                    </a>
                </td>
            </tr>
    
    <tr>
        <td  align="right">
            <a style="text-decoration: none !important;" href="http://www.wumii.com/widget/relatedItems" target="_blank" title="无觅相关文章插件">
                <font size="-1" color="#bbbbbb" style="display: block !important; font-family: arial !important; padding: 5px 0 !important; font-size: 12px !important; color: #bbb !important;">无觅</font>
            </a>
        </td>
    </tr>
</table>]]></description>
			<content:encoded><![CDATA[<p>忙了一年，终于可以休息下了，2011年对工作上的总结就是“忙”，上半年还好，下半年忙得吐血，每次报销打车票的时候，才看到原来自己两个月里有80%以上的时间都是晚上10店以后回家的，甚至有几次凌晨回家，9点又到公司的记录，真的在玩命。回头看看<a title="2010年总结" href="http://fendou.org/2011/01/14/another-year/" target="_blank">2011年初的对2010年的总结</a>，发现自己已经职业了很多。<br />
2011年其实没啥总结的，团队的业务还行，年中的时候试图换个岗位尝试一下没玩过的东西，结果没去成，感觉还挺有意思的。<br />
2011年初的几个愿望基本上都没有实现，时间倒是确实不多，但是总体来说，如果够刻苦总是能找些时间的，2011年读了很多书，数据库和搜索引擎的最多，甚至有考数据库认证的打算，书看完了，却没时间去考试。算了，不考也罢。<br />
2011年整体来说，过得还算顺利，没胖也没瘦， 身体还好，没生什么大病，去了一两次医院。</p>
<p>2012看来没有世界末日的迹象，日子还是得过。2012事情很多，要静下心来做一件自己想做而没有做的事情，可能很难，希望能坚持。2012我希望能够将持续集成引入自己的项目中，让自动化代替部分人工，测试的同事太辛苦，也为了让自己睡个安稳觉。2012年依然有值得期待的书籍，读10本书是必须的。2012得换个手机，旧手机一直没坏，换掉觉得浪费。2012要多认识些朋友，都世界末日了，在黄泉路上大家有个伴。2012多活跃一些，多写点博客。2012希望下班后能早点回家。2012希望自己和亲人身体都好，这才是革命的本钱。</p>
<table class="wumii-related-items" cellspacing="0" cellpadding="2" border="0" width="100%" style="clear: both;">
    
    <tr>
        <td ><b><font size="-1"  style="display: block !important; padding: 20px 0 5px !important;">您可能也喜欢：</font></b></td>
    </tr>
    
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2011%2F01%2F14%2Fanother-year%2F&from=http%3A%2F%2Ffendou.org%2F2012%2F01%2F13%2F2011%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">又过去了一年</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2009%2F04%2F11%2Fchinaunix%2F&from=http%3A%2F%2Ffendou.org%2F2012%2F01%2F13%2F2011%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">ChinaUnix网络优化论坛上海站总结</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2008%2F10%2F19%2Fstudy_method%2F&from=http%3A%2F%2Ffendou.org%2F2012%2F01%2F13%2F2011%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">个人学习方法总结</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2010%2F01%2F04%2Fsummary-2009%2F&from=http%3A%2F%2Ffendou.org%2F2012%2F01%2F13%2F2011%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">迟到的2009总结</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2007%2F12%2F16%2F80-years%2F&from=http%3A%2F%2Ffendou.org%2F2012%2F01%2F13%2F2011%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">80 years</font>
                    </a>
                </td>
            </tr>
    
    <tr>
        <td  align="right">
            <a style="text-decoration: none !important;" href="http://www.wumii.com/widget/relatedItems" target="_blank" title="无觅相关文章插件">
                <font size="-1" color="#bbbbbb" style="display: block !important; font-family: arial !important; padding: 5px 0 !important; font-size: 12px !important; color: #bbb !important;">无觅</font>
            </a>
        </td>
    </tr>
</table>]]></content:encoded>
			<wfw:commentRss>http://fendou.org/2012/01/13/2011/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Python Web Server Gateway Interface v1.0.1 不完整翻译</title>
		<link>http://fendou.org/2011/11/23/python-web-server-gateway-interface-v1-0-1/</link>
		<comments>http://fendou.org/2011/11/23/python-web-server-gateway-interface-v1-0-1/#comments</comments>
		<pubDate>Wed, 23 Nov 2011 03:07:34 +0000</pubDate>
		<dc:creator>崔玉松</dc:creator>
				<category><![CDATA[Excellence Article]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[译文]]></category>

		<guid isPermaLink="false">http://fendou.org/?p=860</guid>
		<description><![CDATA[Contents 简介 基本原理与目标 概述 应用接口 服务器接口 中间件 : 同时扮演两种角色的组件 详细说明 environ 变量 输入和错误流 start_response() 可调用者 Handling the Content-Length Header Buffering and Streaming Middleware Handling of Block Boundaries The write() Callable Unicode Issues Error Handling HTTP 1.1 Expect/Continue Other HTTP Features Thread Support Implementation/Application Notes Server Extension APIs Application Configuration URL Reconstruction Supporting Older (<table class="wumii-related-items" cellspacing="0" cellpadding="3" border="0"  style="clear: both;">
    
    <tr>
        <td colspan="5"><b><font size="-1"  style="display: block !important; padding: 20px 0 5px !important;">您可能也喜欢：</font></b></td>
    </tr>
    
        <tr>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important;">
                    <a target="_blank" title="超级简单Python Socket Server一例" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2010%2F08%2F05%2Fpython-socket-server-simple-example%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F11%2F23%2Fpython-web-server-gateway-interface-v1-0-1%2F">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2012/01/20/14185006.jpg" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">超级简单Python Socket Server一例</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="Python PIL 生成带阴影的缩略图" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2011%2F01%2F31%2Fpython-pil-shadow-thumb%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F11%2F23%2Fpython-web-server-gateway-interface-v1-0-1%2F">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2011/11/20/11136487.jpg" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">Python PIL 生成带阴影的缩略图</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="python非贪婪、多行匹配正则表达式例子" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2010%2F11%2F19%2Fpython-multi-line-non-greedy-regular-expression-sample%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F11%2F23%2Fpython-web-server-gateway-interface-v1-0-1%2F">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/images/blogWidget/wordpress_default.gif" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">python非贪婪、多行匹配正则表达式例子</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="Python Web.py框架实现的简易REST服务原型" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2011%2F02%2F15%2Fweb-py-restful-service%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F11%2F23%2Fpython-web-server-gateway-interface-v1-0-1%2F">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2012/01/20/14184900.jpg" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">Python Web.py框架实现的简易REST服务原型</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="Python 批量生成缩略图" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2011%2F01%2F31%2Fpython-thumb-image%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F11%2F23%2Fpython-web-server-gateway-interface-v1-0-1%2F">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/images/blogWidget/wordpress_default.gif" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">Python 批量生成缩略图</font>
                    </a>
                </td>
        </tr>
    
    <tr>
        <td colspan="5" align="right">
            <a style="text-decoration: none !important;" href="http://www.wumii.com/widget/relatedItems" target="_blank" title="无觅相关文章插件">
                <font size="-1" color="#bbbbbb" style="display: block !important; font-family: arial !important; padding: 5px 0 !important; font-size: 12px !important; color: #bbb !important;">无觅</font>
            </a>
        </td>
    </tr>
</table>]]></description>
			<content:encoded><![CDATA[<p>Contents</p>
<p>简介<br />
基本原理与目标<br />
概述<br />
应用接口<br />
服务器接口<br />
中间件 : 同时扮演两种角色的组件<br />
详细说明<br />
environ 变量<br />
输入和错误流<br />
start_response() 可调用者<br />
Handling the Content-Length Header<br />
Buffering and Streaming<br />
Middleware Handling of Block Boundaries<br />
The write() Callable<br />
Unicode Issues<br />
Error Handling<br />
HTTP 1.1 Expect/Continue<br />
Other HTTP Features<br />
Thread Support<br />
Implementation/Application Notes<br />
Server Extension APIs<br />
Application Configuration<br />
URL Reconstruction<br />
Supporting Older (<2.2) Versions of Python<br />
Optional Platform-Specific File Handling<br />
Questions and Answers<br />
Proposed/Under Discussion<br />
Acknowledgements<br />
References<br />
Copyright<br />
简介</p>
<p>本文档描述一份在web服务器与web应用/web框架之间的标准接口，此接口的目的是使得web应用在不同web服务器之间具有可移植性。</p>
<p>基本原理与目标</p>
<p>python目前拥有大量的web框架，比如 Zope, Quixote, Webware, SkunkWeb, PSO, 和Twisted Web。大量的选择使得新手无所适从，因为总得来说，框架的选择都会限制web服务器的选择。</p>
<p>对比之下，虽然java也拥有许多web框架，但是java的" servlet" API使得使用任何框架编写出来的应用程序可以在任何支持" servlet" API的web服务器上运行。 服务器中这种针对python的API（不管服务器是用python写的，还是内嵌python，还是通过一种协议来启动python）的使用和普及，将分离人们对web框架和对web服务器的选择，用户可以自由选择适合他们的组合，而web服务器和web框架的开发者也能够把精力集中到各自的领域。</p>
<p>因此，这份PEP建议在web服务器和web应用/web框架之间建立一种简单的通用的接口规范，Python Web Server Gateway Interface (WSGI).</p>
<p>但是光有这么一份规范对于改变web服务器和web应用/框架的现状是不够的，只有web服务器和web框架的作者们实现WSGI，他才能起应有的效果。</p>
<p>然而，既然还没有任何框架或服务器实现了WSGI，对实现WSGI也没有什么直接的奖励，那么WSGI必须容易实现，这样才能降低作者的初始投资。</p>
<p>服务器和框架两边接口的实现的简单性，对于WSGI的作用来说，绝对是非常重要的。所以这一点是任何设计决策的首要依据。</p>
<p>对于框架作者来说，实现的简单和使用的方便是不一样的。WSGI为框架作者展示一套绝对没有"frills"的接口，因为象response对象和对cookie的处理这些问题和框架现有的对这些问题的处理是矛盾的。再次重申一遍，WSGI的目的是使得web框架和web服务器之间轻松互连，而不是创建一套新的web框架。</p>
<p>同时也要注意到，这个目标使得WSGI不能依赖任何在当前已部署版本的python没有提供的任何功能，因此，也不会依赖于任何新的标准模块，并且WSGI并不需要2.2.2以上版本的python(当然，在以后的python标准库中内建支持这份接口的web服务器也是个不错的主意)</p>
<p>不光要让现有的或将要出现的框架和服务器容易实现，也应该容易创建请求预处理器、响应处理程序和其他基于WSGI的中间件组件，对于服务器来说他们是应用程序，而对于他们包含的应用程序来说他们是服务器。</p>
<p>如果中间件既简单又健壮，而且WSGI广泛得实现在服务器和框架中，那么就有可能出现全新的python web框架：整个框架都是由几个WSGI中间件组件组成。甚至现有框架的作者都会选择重构将以实现的服务以这种方式提供，变得更象一些和WSGI配合使用的库而不是一个独立的框架。这样web应用开发这就可以根据特定功能选择最适合的组件，而不是所有功能都由同一个框架提供。</p>
<p>当然，这一天无疑还要等很久，在这之间，一个合适的短期目标就是让任何框架在任何服务器上运行起来。</p>
<p>最后，需要指出的是当前版本的WSGI并没有规定一个应用具体以何种方式部署在web服务器或gateway上。目前，这个需要由服务器或gateway的具体实现来定义。如果足够多实现了WSGI的服务器或gateway通过领域实践产生了这个需求，也许可以产生另一份PEP来描述WSGI服务器和应用框架的部署标准。</p>
<p>概述</p>
<p>WSGI接口有两种形式：一个是针对服务器或gateway的，另一个针对应用程序或框架。服务器接口请求一个由应用接口提供的可调用的对象，至于该对象是如何被请求的取决与服务器或gateway。我们假定一些服务器或gateway会需要应用程序的部署人员编写一个简短的脚本来启动一个服务器或gateway的实例，并把应用程序对象提供得服务器，而其他的服务器或gateway需要配置文件或其他机制来指定从哪里导入或者或得应用程序对象。</p>
<p>除了纯粹的服务器/gateway和应用程序/框架，还可以创建实现了这份规格说明书的中间件组件，对于包含他们的服务器他们是应用程序，而对于他们包含的应用程序来说他们是服务器，他们可以用来提供可扩展的API，内容转换，导航和其他有用的功能。</p>
<p>在整个规格说明书中，我们使用短语"一个可调用者"意思是"一个函数，方法，类，或者拥有 __call__ 方法的一个对象实例",这取决与服务器，gateway，应用程序根据需要而选择的合适实现方式。相反服务器，gateway和请求一个可调用者的应用程序不可以依赖具体的实现方式，not introspected upon.</p>
<p>应用接口</p>
<p>一个应用程序对象是一个简单的接受两个参数的可调用对象，这里的对象并不是真的需要一个对象实例，一个函数、方法、类、或者带有 __call__ 方法的对象实例都可以用来做应用程序对象。应用程序对象必须可以多次被请求，实际上服务器/gateway(而非CGI)确实会产生这样的重复请求。</p>
<p>(注意：虽然我们把他叫做"应用程序"对象，但并不是说程序员要把WSGI当做API来调用，我们假定应用程序开发者仍然使用更高层面上的框架服务来开发应用程序，WSGI是提供给框架和服务器开发者使用的工具，并不打算直接对应用程序开发者提供支持)</p>
<p>这里有两个应用程序对象的示例，一个是函数，另一个是类:</p>
<p>def simple_app(environ, start_response):<br />
    """也许是最简单的应用程序对象"""<br />
    status = '200 OK'<br />
    response_headers = [('Content-type','text/plain')]<br />
    start_response(status, response_headers)<br />
    return ['Hello world!\n']</p>
<p>class AppClass:<br />
    """产生同样的输出，不过是使用一个类来实现</p>
<p>    (注意: 'AppClass' 在这里就是 "application" ,所以对它的调用会'AppClass'的一个实例,<br />
    这个实例做为迭代器再返回"application callable"应该返回的那些值)</p>
<p>    如果我们想使用 'AppClass' 的实例直接作为应用程序对象, 我们就必须实现 ``__call__`` 方法,<br />
    外部通过调用这个方法来执行应用程序, 并且我们需要创建一个实例给服务器或gateway使用.<br />
    """</p>
<p>    def __init__(self, environ, start_response):<br />
        self.environ = environ<br />
        self.start = start_response</p>
<p>    def __iter__(self):<br />
        status = '200 OK'<br />
        response_headers = [('Content-type','text/plain')]<br />
        self.start(status, response_headers)<br />
        yield "Hello world!\n"<br />
服务器接口</p>
<p>服务器/gateway为每一个http客户端发来的请求都会请求应用程序可调用者一次。为了说明这里有一个CGI gateway，以一个获取应用程序对象的函数实现，请注意，这个例子拥有有限的错误处理，因为默认情况下没有被捕获的异常都会被输出到sys.stderr并被服务器记录下来。</p>
<p>import os, sys</p>
<p>def run_with_cgi(application):</p>
<p>    environ = dict(os.environ.items())<br />
    environ['wsgi.input']        = sys.stdin<br />
    environ['wsgi.errors']       = sys.stderr<br />
    environ['wsgi.version']      = (1,0)<br />
    environ['wsgi.multithread']  = False<br />
    environ['wsgi.multiprocess'] = True<br />
    environ['wsgi.run_once']    = True</p>
<p>    if environ.get('HTTPS','off') in ('on','1'):<br />
        environ['wsgi.url_scheme'] = 'https'<br />
    else:<br />
        environ['wsgi.url_scheme'] = 'http'</p>
<p>    headers_set = []<br />
    headers_sent = []</p>
<p>    def write(data):<br />
        if not headers_set:<br />
             raise AssertionError("write() before start_response()")</p>
<p>        elif not headers_sent:<br />
             # Before the first output, send the stored headers<br />
             status, response_headers = headers_sent[:] = headers_set<br />
             sys.stdout.write('Status: %s\r\n' % status)<br />
             for header in response_headers:<br />
                 sys.stdout.write('%s: %s\r\n' % header)<br />
             sys.stdout.write('\r\n')</p>
<p>        sys.stdout.write(data)<br />
        sys.stdout.flush()</p>
<p>    def start_response(status,response_headers,exc_info=None):<br />
        if exc_info:<br />
            try:<br />
                if headers_sent:<br />
                    # Re-raise original exception if headers sent<br />
                    raise exc_info[0], exc_info[1], exc_info[2]<br />
            finally:<br />
                exc_info = None     # avoid dangling circular ref<br />
        elif headers_set:<br />
            raise AssertionError("Headers already set!")</p>
<p>        headers_set[:] = [status,response_headers]<br />
        return write</p>
<p>    result = application(environ, start_response)<br />
    try:<br />
        for data in result:<br />
            if data:    # body 出现以前不发送headers<br />
                write(data)<br />
        if not headers_sent:<br />
            write('')   # 如果这个时候body为空则发送header<br />
    finally:<br />
        if hasattr(result,'close'):<br />
            result.close()<br />
中间件 : 同时扮演两种角色的组件</p>
<p>注意到单个对象可以作为请求应用程序的服务器存在，也可以作为被服务器调用的应用程序存在。这样的中间件可以执行这样一些功能:</p>
<p>重写前面提到的 environ 之后，可以根据目标URL将请求传递到不同的应用程序对象<br />
允许多个应用程序和框架在同一个进程中运行<br />
通过在网络传递请求和响应，实现负载均衡和远程处理<br />
对内容进行后加工，比如附加xsl样式表<br />
中间件的存在对于服务器接口和应用接口来说都应该是透明的，并且不需要特别的支持。希望在应用程序中加入中间件的用户只需简单得把中间件当作应用提供给服务器，并配置中间件足见以服务器的身份来请求应用程序。</p>
<p>当然，中间件组件包裹的可能是包裹应用程序的另一个中间件组件，这样循环下去就构成了我们称为"中间件堆栈"的东西了。 for the most part,中间件要符合应用接口和服务器接口提出的一些限制和要求，有些时候这样的限制甚至比纯粹的服务器或应用程序还要严格，这些地方我们会特别指出。</p>
<p>这里有一个中间件组件的例子，他用Joe Strout的piglatin.py将text/plain的响应转换成pig latin（注意：真正的中间件应该使用更加安全的方式——应该检查内容的类型和内容的编码，同样这个简单的例子还忽略了一个单词might be split across a block boundary的可能性)。</p>
<p>from piglatin import piglatin</p>
<p>class LatinIter:</p>
<p>    """如果可以的话，将输出转换为piglatin格式</p>
<p>    Note that the "okayness" can change until the application yields<br />
    its first non-empty string, so 'transform_ok' has to be a mutable<br />
    truth value."""</p>
<p>    def __init__(self,result,transform_ok):<br />
        if hasattr(result,'close'):<br />
            self.close = result.close<br />
        self._next = iter(result).next<br />
        self.transform_ok = transform_ok</p>
<p>    def __iter__(self):<br />
        return self</p>
<p>    def next(self):<br />
        if self.transform_ok:<br />
            return piglatin(self._next())<br />
        else:<br />
            return self._next()</p>
<p>class Latinator:</p>
<p>    # by default, don't transform output<br />
    transform = False</p>
<p>    def __init__(self, application):<br />
        self.application = application</p>
<p>    def __call__(environ, start_response):</p>
<p>        transform_ok = []</p>
<p>        def start_latin(status,response_headers,exc_info=None):</p>
<p>            # Reset ok flag, in case this is a repeat call<br />
            transform_ok[:]=[]</p>
<p>            for name,value in response_headers:<br />
                if name.lower()=='content-type' and value=='text/plain':<br />
                    transform_ok.append(True)<br />
                    # Strip content-length if present, else it'll be wrong<br />
                    response_headers = [(name,value)<br />
                        for name,value in response_headers<br />
                            if name.lower()<>&#8216;content-length&#8217;<br />
                    ]<br />
                    break</p>
<p>            write = start_response(status,response_headers,exc_info)</p>
<p>            if transform_ok:<br />
                def write_latin(data):<br />
                    write(piglatin(data))<br />
                return write_latin<br />
            else:<br />
                return write</p>
<p>        return LatinIter(self.application(environ,start_latin),transform_ok)</p>
<p># Run foo_app under a Latinator&#8217;s control, using the example CGI gateway<br />
from foo_app import foo_app<br />
run_with_cgi(Latinator(foo_app))<br />
详细说明</p>
<p>应用程序对象必须接受两个参数，为了方便说明我们不妨分别命名为 environ 和 start_response ，但并非必须取这个名字。服务器或gateway必须用这两个参数请求应用程序对象(比如象上面展示的,这样调用 result = application(environ,start_response) )</p>
<p>参数 environ 是个字典对象，包含CGI风格的环境变量。这个对象必须是一个python内建的字典对象(不能是子类、UserDict或其他对字典对象的模仿)，应用程序可以以任何他愿意的方式修改这个字典， environ 还应该包含一些特定的WSGI需要的变量(在后面的节里会描述)，有可以包含一些服务器特定的扩展变量，通过下面提高的约定命名。</p>
<p>start_response 参数是一个接受两个必须参数和一个可选参数的可调用者。方便说明，我们分别把他们命名为 status,  response_headers ,和 exc_info 。应用程序必须用这些参数来请求可调用者 start_response (比如象这样 start_response(status,response_headers) )</p>
<p>参数 status 是一个形式象”999 Message here”的状态字符串。而 response_headers 参数是元组(header_name,header_value)的列表,描述http响应头。可选的 exc_info 参数会在下面的 `The start_response() Callable`_ 和 Error Handling 两节中描述，他只有在应用程序产生了错误并希望在浏览器上显示错误的时候才有用。</p>
<p>start_response 可调用者必须返回一个 write(body_data) 可调用者，他接受一个可选参数：一个将要被做为http响应体的一部分输出的字符串(注意：提供可调用者 write() 只是为了支持现有框架的必要的输出API，新的应用程序或框架尽量避免使用，详细情况请看 Buffering and Streaming 一节。)</p>
<p>当被服务器请求的时候，应用程序对象必须返回一个0或多个可迭代的字符串，这可以通过多种方法完成，比如返回一个字符串的列表，或者应用程序本身是一个生产字符串的函数，或者应用程序是一个类而他的实例是可迭代的，不管怎么完成，应用程序对象必须总是返回0或多个可迭代的字符串。</p>
<p>服务器必须将产生的字符串以一种无缓冲的方式传送到客户端，每次传完一个字符串再去获取下一个。(换句话说，应用程序应该实现自己的缓冲，更多关于应用程序输出必须如何处理的细节请阅读下面的 Buffering and Streaming 节。)</p>
<p>服务器或gateway应该把产生的字符串当字节流对待：特别地，他必须保证没修改行的结尾。应用程序负责确保字符串是以与客户端匹配的编码输出(服务器/gateway可能会附加HTTP传送编码，或者为了实现一些http的特性而进行一些转换比如byte-range transmission，更多细节请看下面的 Other HTTP Features )</p>
<p>如果调 len(iterable) 成功，服务器将认为结果是正确的。也就是说，应用程序返回的可迭代的字符串提供了一个有用 的__len__() 方法，么肯定返回了正确的结果(关于这个方法正常情况下如何被使用的请阅读 Handling the Content-Length Header )</p>
<p>如果应用程序返回的可迭代者有close()方法，则不管该请求是正常结束还是由于错误而终止，服务器/gateway都**必须**在结束该请求之前调用这个方法，（这是用来支持应用程序对资源的释放，This protocol is intended to complement PEP 325&#8242;s generator support, and other common iterables with close() methods.）</p>
<p>（注意：应用程序必须在可迭代者产生第一个字符串之间请求 start_response() 可调用者，这样服务器才能在发送任何主体内容之前发送响应头，然而这一步也可以在可迭代者第一次迭代的时候执行,所以服务器不能假定开始迭代之前 start_response() 已经被调用过了）</p>
<p>最后，服务器或gateway不能应用程序返回的可迭代者的任何其他属性，除非是针对服务器或gateway特定类型的实例，比如wsgi.file_wrapper返回的“file wrapper”（阅读 Optional Platform-Specific File Handling )。通常情况下，只有在这里指定的属性，或者通过PEP 234 iteration APIs才是可以访问的。</p>
<p>environ 变量</p>
<p>environ 字典被用来包含这些在Common Gateway Interface specification [2]_中定义了的CGI环境变量。 下面这些变量 必须 呈现出来, 除非其值是空字符串,这种情况下如果下面没有特别指出的话他们 可能 会被忽略</p>
<p>REQUEST_METHOD<br />
HTTP请求的方式, 比如 “GET” 或者 “POST”. 这个不可能是空字符串并且也是必须给出的。<br />
SCRIPT_NAME<br />
请求URL中路径的开始部分，对应应用程序对象，这样应用程序就知道它的虚拟位置。 如果该应用程序对应服务器的 根 的话， 它 可能 是为空字符串。<br />
PATH_INFO<br />
请求URL中路径的剩余部分，指定请求的目标在应用程序内部的虚拟位置。 如果请求的目标是应用程序跟并且没有trailing slash的话，可能为空字符串 。<br />
QUERY_STRING<br />
请求URL中跟在”?”后面的那部分,可能为空或不存在.<br />
CONTENT_TYPE<br />
HTTP请求中任何 Content-Type 域的内容。<br />
CONTENT_LENGTH<br />
HTTP请求中任何 Content-Length 域的内容。 可能为空或不存在.<br />
SERVER_NAME, SERVER_PORT<br />
这些变量可以和 SCRIPT_NAME、PATH_INFO 一起组成完整的URL。 然而要注意的是，重建请求URL的时候应该优先使用 HTTP_HOST 而非 SERVER_NAME 。 详细内容请阅读下面的 URL Reconstruction 。 SERVER_NAME 和 SERVER_PORT 永远 是空字符串，也总是必须存在的。<br />
SERVER_PROTOCOL<br />
客户端发送请求所使用协议的版本。通常是类似 “HTTP/1.0&#8243; 或 “HTTP/1.1&#8243; 的东西 可以被用来判断如何处理请求headers。(既然这个变量表示的是请求中使用的协议，而且和服 务器响应时使用的协议无关，也许它应该被叫做 REQUEST_PROTOCOL 。然后，为了保持和 CGI的兼容性，我们还是使用已有的名字。)<br />
HTTP_ 变量<br />
对应客户端提供的HTTP请求headers (也就是说名字以 “HTTP_” 开头的变量)。这些变量的 存在与否应该和请求中的合适的HTTP header一致。<br />
服务器或gateway 应该 尽可能提供其他可用的CGI变量。另外，如果用了SSL，服务器或 gateway也 应该 尽可能提供可用的Apache SSL环境变量 [5] ，比如 HTTPS=on 和 SSL_PROTOCOL“。不过要注意，使用了任何上面没有列出的变量的应用程序对不支持相关扩展 的服务器来说就有点necessarily non-portable。(比如，不发布文件的web服务器就不能提供一 个有意义的 “DOCUMENT_ROOT 或 PATH_TRANSLATED 。)</p>
<p>一个支持WSGI的服务器或gateway 应该 在描述它们自己的同时说明它们可以提供些什么变量 应用程序 应该 对所有他们需要的变量的存在性进行检查，并且在某变量不存在的时候有 备用的措施</p>
<p>注意: 不需要的变量 (比如在不需要验证的情况下的 REMOTE_USER ) 应该被移出 environ 字典。同样注意CGI定义的变量如果存在的话必须是字符串。任何 str 类型以外的CGI变量的 存在都是对本规范的违反</p>
<p>除了CGI定义的变量， environ 字典也可以包含任意操作系统的环境变量，并且 必须 包含下面这些WSGI定义的变量:</p>
<p>变量	值<br />
wsgi.version	元组 (1,0), 表明WSGI 版本 1.0.<br />
wsgi.url_scheme	A string representing the “scheme” portion of the URL at which the application is being invoked. Normally, this will have the value “http” or “https”, as appropriate.<br />
wsgi.input	An input stream (file-like object) from which the HTTP request body can be read. (The server or gateway may perform reads on-demand as requested by the application, or it may pre- read the client&#8217;s request body and buffer it in-memory or on disk, or use any other technique for providing such an input stream, according to its preference.)<br />
wsgi.errors<br />
An output stream (file-like object) to which error output can be written, for the purpose of recording program or other errors in a standardized and possibly centralized location. This should be a “text mode” stream; i.e., applications should use “\n” as a line ending, and assume that it will be converted to the correct line ending by the server/gateway.</p>
<p>For many servers, wsgi.errors will be the server&#8217;s main error log. Alternatively, this may be sys.stderr, or a log file of some sort. The server&#8217;s documentation should include an explanation of how to configure this or where to find the recorded output. A server or gateway may supply different error streams to different applications, if this is desired.<br />
wsgi.multithread	This value should evaluate true if the application object may be simultaneously invoked by another thread in the same process, and should evaluate false otherwise.<br />
wsgi.multiprocess	This value should evaluate true if an equivalent application object may be simultaneously invoked by another process, and should evaluate false otherwise.<br />
wsgi.run_once	This value should evaluate true if the server or gateway expects (but does not guarantee!) that the application will only be invoked this one time during the life of its containing process. Normally, this will only be true for a gateway based on CGI (or something similar).<br />
最后 environ 字典也可以包含服务器定义的变量。这些变量的名字必须是小写字母、数字、点和下划线， 并且应该带一个能唯一代表服务器或gateway的前缀。比如， mod_python 可能会定义象这样的一些变量: mod_python.some_variable.</p>
<p>输入和错误流</p>
<p>服务器提供的输入和错误流必须提供以下方法:</p>
<p>方法	流	注解<br />
read(size)	input	1<br />
readline()	input	1,2<br />
readlines(hint)	input	1,3<br />
__iter__()	input<br />
flush()	errors	4<br />
write(str)	errors<br />
writelines(seq)	errors<br />
每个方法的语义如果上面没有特别指出均和Python Library Reference记载的一样:</p>
<p>The server is not required to read past the client&#8217;s specified Content-Length, and is allowed to simulate an end-of-file condition if the application attempts to read past that point. The application should not attempt to read more data than is specified by the CONTENT_LENGTH variable.<br />
The optional “size” argument to readline() is not supported, as it may be complex for server authors to implement, and is not often used in practice.<br />
Note that the hint argument to readlines() is optional for both caller and implementer. The application is free not to supply it, and the server or gateway is free to ignore it.<br />
Since the errors stream may not be rewound, servers and gateways are free to forward write operations immediately, without buffering. In this case, the flush() method may be a no-op. Portable applications, however, cannot assume that output is unbuffered or that flush() is a no-op. They must call flush() if they need to ensure that output has in fact been written. (For example, to minimize intermingling of data from multiple processes writing to the same error log.)<br />
The methods listed in the table above must be supported by all servers conforming to this specification. Applications conforming to this specification must not use any other methods or attributes of the input or errors objects. In particular, applications must not attempt to close these streams, even if they possess close() methods.</p>
<p>start_response() 可调用者</p>
<p>传给应用程序对象的第二个参数是一个形为 start_response(status,response_headers,exc_info=None) 的可调用者. (As with all WSGI callables, the arguments must be supplied positionally, not by keyword.)  start_response 可调用者是用来开始HTTP响应，它必须返回一个 write(body_data) 可调用者 (阅读下面的 Buffering and Streaming).</p>
<p>status“参数是一个HTTP “status” 字符串，比如 “”200 OK” 或 “404 Not Found”. 也就是说，他是一个由状态编号和具体信息组成的字符串，按这个顺序并用空格隔开，两头没有其他空格和其他字符。 (更多信息请阅读RFC 2616, Section 6.1.1) 该字符串 禁止 包含控制字符，也不允许以回车、换行或他们的 组合结束。</p>
<p>response_headers“参数是一个 “(header_name,header_value) 元组的列表。它必须是一个Python列表； 也就是说 type(response_headers) is ListType,并且服务器 可以 以任何方式改变其内容。 每一个 header_name 必须是一个没有冒号或其他标点符号的合法的HTTP header字段名 (在RFC 2616, Section 4.2中有详细定义).</p>
<p>每一个 header_value 禁止 包含 任何 控制字符,包括回车或换行。 (这些要求是要使得那些必须检查或修改响应头的服务器、gateway、响应处理中间件所必须执行的解析工作的 复杂性降到最低。)</p>
<p>In general, the server or gateway is responsible for ensuring that correct headers are sent to the client: if the application omits a header required by HTTP (or other relevant specifications that are in effect), the server or gateway must add it. For example, the HTTP Date: and Server: headers would normally be supplied by the server or gateway.</p>
<p>(A reminder for server/gateway authors: HTTP header names are case-insensitive, so be sure to take that into consideration when examining application-supplied headers!)</p>
<p>Applications and middleware are forbidden from using HTTP/1.1 “hop-by-hop” features or headers, any equivalent features in HTTP/1.0, or any headers that would affect the persistence of the client&#8217;s connection to the web server. These features are the exclusive province of the actual web server, and a server or gateway should consider it a fatal error for an application to attempt sending them, and raise an error if they are supplied to start_response(). (For more specifics on “hop-by-hop” features and headers, please see the Other HTTP Features section below.)</p>
<p>The start_response callable must not actually transmit the response headers. Instead, it must store them for the server or gateway to transmit only after the first iteration of the application return value that yields a non-empty string, or upon the application&#8217;s first invocation of the write() callable. In other words, response headers must not be sent until there is actual body data available, or until the application&#8217;s returned iterable is exhausted. (The only possible exception to this rule is if the response headers explicitly include a Content-Length of zero.)</p>
<p>This delaying of response header transmission is to ensure that buffered and asynchronous applications can replace their originally intended output with error output, up until the last possible moment. For example, the application may need to change the response status from “200 OK” to “500 Internal Error”, if an error occurs while the body is being generated within an application buffer.</p>
<p>The exc_info argument, if supplied, must be a Python sys.exc_info() tuple. This argument should be supplied by the application only if start_response is being called by an error handler. If exc_info is supplied, and no HTTP headers have been output yet, start_response should replace the currently-stored HTTP response headers with the newly-supplied ones, thus allowing the application to “change its mind” about the output when an error has occurred.</p>
<p>However, if exc_info is provided, and the HTTP headers have already been sent, start_response must raise an error, and should raise the exc_info tuple. That is:</p>
<p>raise exc_info[0],exc_info[1],exc_info[2]<br />
This will re-raise the exception trapped by the application, and in principle should abort the application. (It is not safe for the application to attempt error output to the browser once the HTTP headers have already been sent.) The application must not trap any exceptions raised by start_response, if it called start_response with exc_info. Instead, it should allow such exceptions to propagate back to the server or gateway. See Error Handling below, for more details.</p>
<p>The application may call start_response more than once, if and only if the exc_info argument is provided. More precisely, it is a fatal error to call start_response without the exc_info argument if start_response has already been called within the current invocation of the application. (See the example CGI gateway above for an illustration of the correct logic.)</p>
<p>Note: servers, gateways, or middleware implementing start_response should ensure that no reference is held to the exc_info parameter beyond the duration of the function&#8217;s execution, to avoid creating a circular reference through the traceback and frames involved. The simplest way to do this is something like:</p>
<p>def start_response(status,response_headers,exc_info=None):<br />
    if exc_info:<br />
         try:<br />
             # do stuff w/exc_info here<br />
         finally:<br />
             exc_info = None    # Avoid circular ref.<br />
The example CGI gateway provides another illustration of this technique.</p>
<p>Handling the Content-Length Header</p>
<p>If the application does not supply a Content-Length header, a server or gateway may choose one of several approaches to handling it. The simplest of these is to close the client connection when the response is completed.</p>
<p>Under some circumstances, however, the server or gateway may be able to either generate a Content-Length header, or at least avoid the need to close the client connection. If the application does not call the write() callable, and returns an iterable whose len() is 1, then the server can automatically determine Content-Length by taking the length of the first string yielded by the iterable.</p>
<p>And, if the server and client both support HTTP/1.1 “chunked encoding” [3], then the server may use chunked encoding to send a chunk for each write() call or string yielded by the iterable, thus generating a Content-Length header for each chunk. This allows the server to keep the client connection alive, if it wishes to do so. Note that the server must comply fully with RFC 2616 when doing this, or else fall back to one of the other strategies for dealing with the absence of Content-Length.</p>
<p>(Note: applications and middleware must not apply any kind of Transfer-Encoding to their output, such as chunking or gzipping; as “hop-by-hop” operations, these encodings are the province of the actual web server/gateway. See Other HTTP Features below, for more details.)</p>
<p>Buffering and Streaming</p>
<p>Generally speaking, applications will achieve the best throughput by buffering their (modestly-sized) output and sending it all at once. This is a common approach in existing frameworks such as Zope: the output is buffered in a StringIO or similar object, then transmitted all at once, along with the response headers.</p>
<p>The corresponding approach in WSGI is for the application to simply return a single-element iterable (such as a list) containing the response body as a single string. This is the recommended approach for the vast majority of application functions, that render HTML pages whose text easily fits in memory.</p>
<p>For large files, however, or for specialized uses of HTTP streaming (such as multipart “server push”), an application may need to provide output in smaller blocks (e.g. to avoid loading a large file into memory). It&#8217;s also sometimes the case that part of a response may be time-consuming to produce, but it would be useful to send ahead the portion of the response that precedes it.</p>
<p>In these cases, applications will usually return an iterator (often a generator-iterator) that produces the output in a block-by-block fashion. These blocks may be broken to coincide with mulitpart boundaries (for “server push”), or just before time-consuming tasks (such as reading another block of an on-disk file).</p>
<p>WSGI servers, gateways, and middleware must not delay the transmission of any block; they must either fully transmit the block to the client, or guarantee that they will continue transmission even while the application is producing its next block. A server/gateway or middleware may provide this guarantee in one of three ways:</p>
<p>Send the entire block to the operating system (and request that any O/S buffers be flushed) before returning control to the application, OR<br />
Use a different thread to ensure that the block continues to be transmitted while the application produces the next block.<br />
(Middleware only) send the entire block to its parent gateway/server<br />
By providing this guarantee, WSGI allows applications to ensure that transmission will not become stalled at an arbitrary point in their output data. This is critical for proper functioning of e.g. multipart “server push” streaming, where data between multipart boundaries should be transmitted in full to the client.</p>
<p>Middleware Handling of Block Boundaries</p>
<p>In order to better support asynchronous applications and servers, middleware components must not block iteration waiting for multiple values from an application iterable. If the middleware needs to accumulate more data from the application before it can produce any output, it must yield an empty string.</p>
<p>To put this requirement another way, a middleware component must yield at least one value each time its underlying application yields a value. If the middleware cannot yield any other value, it must yield an empty string.</p>
<p>This requirement ensures that asynchronous applications and servers can conspire to reduce the number of threads that are required to run a given number of application instances simultaneously.</p>
<p>Note also that this requirement means that middleware must return an iterable as soon as its underlying application returns an iterable. It is also forbidden for middleware to use the write() callable to transmit data that is yielded by an underlying application. Middleware may only use their parent server&#8217;s write() callable to transmit data that the underlying application sent using a middleware-provided write() callable.</p>
<p>The write() Callable</p>
<p>Some existing application framework APIs support unbuffered output in a different manner than WSGI. Specifically, they provide a “write” function or method of some kind to write an unbuffered block of data, or else they provide a buffered “write” function and a “flush” mechanism to flush the buffer.</p>
<p>Unfortunately, such APIs cannot be implemented in terms of WSGI&#8217;s “iterable” application return value, unless threads or other special mechanisms are used.</p>
<p>Therefore, to allow these frameworks to continue using an imperative API, WSGI includes a special write() callable, returned by the start_response callable.</p>
<p>New WSGI applications and frameworks should not use the write() callable if it is possible to avoid doing so. The write() callable is strictly a hack to support imperative streaming APIs. In general, applications should produce their output via their returned iterable, as this makes it possible for web servers to interleave other tasks in the same Python thread, potentially providing better throughput for the server as a whole.</p>
<p>The write() callable is returned by the start_response() callable, and it accepts a single parameter: a string to be written as part of the HTTP response body, that is treated exactly as though it had been yielded by the output iterable. In other words, before write() returns, it must guarantee that the passed-in string was either completely sent to the client, or that it is buffered for transmission while the application proceeds onward.</p>
<p>An application must return an iterable object, even if it uses write() to produce all or part of its response body. The returned iterable may be empty (i.e. yield no non-empty strings), but if it does yield non-empty strings, that output must be treated normally by the server or gateway (i.e., it must be sent or queued immediately). Applications must not invoke write() from within their return iterable, and therefore any strings yielded by the iterable are transmitted after all strings passed to write() have been sent to the client.</p>
<p>Unicode Issues</p>
<p>HTTP does not directly support Unicode, and neither does this interface. All encoding/decoding must be handled by the application; all strings passed to or from the server must be standard Python byte strings, not Unicode objects. The result of using a Unicode object where a string object is required, is undefined.</p>
<p>Note also that strings passed to start_response() as a status or as response headers must follow RFC 2616 with respect to encoding. That is, they must either be ISO-8859-1 characters, or use RFC 2047 MIME encoding.</p>
<p>On Python platforms where the str or StringType type is in fact Unicode-based (e.g. Jython, IronPython, Python 3000, etc.), all “strings” referred to in this specification must contain only code points representable in ISO-8859-1 encoding (\u0000 through \u00FF, inclusive). It is a fatal error for an application to supply strings containing any other Unicode character or code point. Similarly, servers and gateways must not supply strings to an application containing any other Unicode characters.</p>
<p>Again, all strings referred to in this specification must be of type str or StringType, and must not be of type unicode or UnicodeType. And, even if a given platform allows for more than 8 bits per character in str/StringType objects, only the lower 8 bits may be used, for any value referred to in this specification as a “string”.</p>
<p>Error Handling</p>
<p>In general, applications should try to trap their own, internal errors, and display a helpful message in the browser. (It is up to the application to decide what “helpful” means in this context.)</p>
<p>However, to display such a message, the application must not have actually sent any data to the browser yet, or else it risks corrupting the response. WSGI therefore provides a mechanism to either allow the application to send its error message, or be automatically aborted: the exc_info argument to start_response. Here is an example of its use:</p>
<p>try:<br />
    # regular application code here<br />
    status = “200 Froody”<br />
    response_headers = [("content-type","text/plain")]<br />
    start_response(status, response_headers)<br />
    return ["normal body goes here"]<br />
except:<br />
    # XXX should trap runtime issues like MemoryError, KeyboardInterrupt<br />
    #     in a separate handler before this bare &#8216;except:&#8217;&#8230;<br />
    status = “500 Oops”<br />
    response_headers = [("content-type","text/plain")]<br />
    start_response(status, response_headers, sys.exc_info())<br />
    return ["error body goes here"]<br />
If no output has been written when an exception occurs, the call to start_response will return normally, and the application will return an error body to be sent to the browser. However, if any output has already been sent to the browser, start_response will reraise the provided exception. This exception should not be trapped by the application, and so the application will abort. The server or gateway can then trap this (fatal) exception and abort the response.</p>
<p>Servers should trap and log any exception that aborts an application or the iteration of its return value. If a partial response has already been written to the browser when an application error occurs, the server or gateway may attempt to add an error message to the output, if the already-sent headers indicate a text/* content type that the server knows how to modify cleanly.</p>
<p>Some middleware may wish to provide additional exception handling services, or intercept and replace application error messages. In such cases, middleware may choose to not re-raise the exc_info supplied to start_response, but instead raise a middleware-specific exception, or simply return without an exception after storing the supplied arguments. This will then cause the application to return its error body iterable (or invoke write()), allowing the middleware to capture and modify the error output. These techniques will work as long as application authors:</p>
<p>Always provide exc_info when beginning an error response<br />
Never trap errors raised by start_response when exc_info is being provided<br />
HTTP 1.1 Expect/Continue</p>
<p>Servers and gateways that implement HTTP 1.1 must provide transparent support for HTTP 1.1&#8242;s “expect/continue” mechanism. This may be done in any of several ways:</p>
<p>Respond to requests containing an Expect: 100-continue request with an immediate “100 Continue” response, and proceed normally.<br />
Proceed with the request normally, but provide the application with a wsgi.input stream that will send the “100 Continue” response if/when the application first attempts to read from the input stream. The read request must then remain blocked until the client responds.<br />
Wait until the client decides that the server does not support expect/continue, and sends the request body on its own. (This is suboptimal, and is not recommended.)<br />
Note that these behavior restrictions do not apply for HTTP 1.0 requests, or for requests that are not directed to an application object. For more information on HTTP 1.1 Expect/Continue, see RFC 2616, sections 8.2.3 and 10.1.1.</p>
<p>Other HTTP Features</p>
<p>In general, servers and gateways should “play dumb” and allow the application complete control over its output. They should only make changes that do not alter the effective semantics of the application&#8217;s response. It is always possible for the application developer to add middleware components to supply additional features, so server/gateway developers should be conservative in their implementation. In a sense, a server should consider itself to be like an HTTP “gateway server”, with the application being an HTTP “origin server”. (See RFC 2616, section 1.3, for the definition of these terms.)</p>
<p>However, because WSGI servers and applications do not communicate via HTTP, what RFC 2616 calls “hop-by-hop” headers do not apply to WSGI internal communications. WSGI applications must not generate any “hop-by-hop” headers [4], attempt to use HTTP features that would require them to generate such headers, or rely on the content of any incoming “hop-by-hop” headers in the environ dictionary. WSGI servers must handle any supported inbound “hop-by-hop” headers on their own, such as by decoding any inbound Transfer-Encoding, including chunked encoding if applicable.</p>
<p>Applying these principles to a variety of HTTP features, it should be clear that a server may handle cache validation via the If-None-Match and If-Modified-Since request headers and the Last-Modified and ETag response headers. However, it is not required to do this, and the application should perform its own cache validation if it wants to support that feature, since the server/gateway is not required to do such validation.</p>
<p>Similarly, a server may re-encode or transport-encode an application&#8217;s response, but the application should use a suitable content encoding on its own, and must not apply a transport encoding. A server may transmit byte ranges of the application&#8217;s response if requested by the client, and the application doesn&#8217;t natively support byte ranges. Again, however, the application should perform this function on its own if desired.</p>
<p>Note that these restrictions on applications do not necessarily mean that every application must reimplement every HTTP feature; many HTTP features can be partially or fully implemented by middleware components, thus freeing both server and application authors from implementing the same features over and over again.</p>
<p>Thread Support</p>
<p>Thread support, or lack thereof, is also server-dependent. Servers that can run multiple requests in parallel, should also provide the option of running an application in a single-threaded fashion, so that applications or frameworks that are not thread-safe may still be used with that server.</p>
<p>Implementation/Application Notes</p>
<p>Server Extension APIs</p>
<p>Some server authors may wish to expose more advanced APIs, that application or framework authors can use for specialized purposes. For example, a gateway based on mod_python might wish to expose part of the Apache API as a WSGI extension.</p>
<p>In the simplest case, this requires nothing more than defining an environ variable, such as mod_python.some_api. But, in many cases, the possible presence of middleware can make this difficult. For example, an API that offers access to the same HTTP headers that are found in environ variables, might return different data if environ has been modified by middleware.</p>
<p>In general, any extension API that duplicates, supplants, or bypasses some portion of WSGI functionality runs the risk of being incompatible with middleware components. Server/gateway developers should not assume that nobody will use middleware, because some framework developers specifically intend to organize or reorganize their frameworks to function almost entirely as middleware of various kinds.</p>
<p>So, to provide maximum compatibility, servers and gateways that provide extension APIs that replace some WSGI functionality, must design those APIs so that they are invoked using the portion of the API that they replace. For example, an extension API to access HTTP request headers must require the application to pass in its current environ, so that the server/gateway may verify that HTTP headers accessible via the API have not been altered by middleware. If the extension API cannot guarantee that it will always agree with environ about the contents of HTTP headers, it must refuse service to the application, e.g. by raising an error, returning None instead of a header collection, or whatever is appropriate to the API.</p>
<p>Similarly, if an extension API provides an alternate means of writing response data or headers, it should require the start_response callable to be passed in, before the application can obtain the extended service. If the object passed in is not the same one that the server/gateway originally supplied to the application, it cannot guarantee correct operation and must refuse to provide the extended service to the application.</p>
<p>These guidelines also apply to middleware that adds information such as parsed cookies, form variables, sessions, and the like to environ. Specifically, such middleware should provide these features as functions which operate on environ, rather than simply stuffing values into environ. This helps ensure that information is calculated from environ after any middleware has done any URL rewrites or other environ modifications.</p>
<p>It is very important that these “safe extension” rules be followed by both server/gateway and middleware developers, in order to avoid a future in which middleware developers are forced to delete any and all extension APIs from environ to ensure that their mediation isn&#8217;t being bypassed by applications using those extensions!</p>
<p>Application Configuration</p>
<p>This specification does not define how a server selects or obtains an application to invoke. These and other configuration options are highly server-specific matters. It is expected that server/gateway authors will document how to configure the server to execute a particular application object, and with what options (such as threading options).</p>
<p>Framework authors, on the other hand, should document how to create an application object that wraps their framework&#8217;s functionality. The user, who has chosen both the server and the application framework, must connect the two together. However, since both the framework and the server now have a common interface, this should be merely a mechanical matter, rather than a significant engineering effort for each new server/framework pair.</p>
<p>Finally, some applications, frameworks, and middleware may wish to use the environ dictionary to receive simple string configuration options. Servers and gateways should support this by allowing an application&#8217;s deployer to specify name-value pairs to be placed in environ. In the simplest case, this support can consist merely of copying all operating system-supplied environment variables from os.environ into the environ dictionary, since the deployer in principle can configure these externally to the server, or in the CGI case they may be able to be set via the server&#8217;s configuration files.</p>
<p>Applications should try to keep such required variables to a minimum, since not all servers will support easy configuration of them. Of course, even in the worst case, persons deploying an application can create a script to supply the necessary configuration values:</p>
<p>from the_app import application</p>
<p>def new_app(environ,start_response):<br />
    environ['the_app.configval1'] = &#8216;something&#8217;<br />
    return application(environ,start_response)<br />
But, most existing applications and frameworks will probably only need a single configuration value from environ, to indicate the location of their application or framework-specific configuration file(s). (Of course, applications should cache such configuration, to avoid having to re-read it upon each invocation.)</p>
<p>URL Reconstruction</p>
<p>If an application wishes to reconstruct a request&#8217;s complete URL, it may do so using the following algorithm, contributed by Ian Bicking:</p>
<p>from urllib import quote<br />
url = environ['wsgi.url_scheme']+&#8217;://&#8217;</p>
<p>if environ.get(&#8216;HTTP_HOST&#8217;):<br />
    url += environ['HTTP_HOST']<br />
else:<br />
    url += environ['SERVER_NAME']</p>
<p>    if environ['wsgi.url_scheme'] == &#8216;https&#8217;:<br />
        if environ['SERVER_PORT'] != &#8217;443&#8242;:<br />
           url += &#8216;:&#8217; + environ['SERVER_PORT']<br />
    else:<br />
        if environ['SERVER_PORT'] != &#8217;80&#8242;:<br />
           url += &#8216;:&#8217; + environ['SERVER_PORT']</p>
<p>url += quote(environ.get(&#8216;SCRIPT_NAME&#8217;,”))<br />
url += quote(environ.get(&#8216;PATH_INFO&#8217;,”))<br />
if environ.get(&#8216;QUERY_STRING&#8217;):<br />
    url += &#8216;?&#8217; + environ['QUERY_STRING']<br />
Note that such a reconstructed URL may not be precisely the same URI as requested by the client. Server rewrite rules, for example, may have modified the client&#8217;s originally requested URL to place it in a canonical form.</p>
<p>Supporting Older (<2.2) Versions of Python</p>
<p>Some servers, gateways, or applications may wish to support older (<2.2) versions of Python. This is especially important if Jython is a target platform, since as of this writing a production-ready version of Jython 2.2 is not yet available.</p>
<p>For servers and gateways, this is relatively straightforward: servers and gateways targeting pre-2.2 versions of Python must simply restrict themselves to using only a standard “for” loop to iterate over any iterable returned by an application. This is the only way to ensure source-level compatibility with both the pre-2.2 iterator protocol (discussed further below) and “today&#8217;s” iterator protocol (see PEP 234).</p>
<p>(Note that this technique necessarily applies only to servers, gateways, or middleware that are written in Python. Discussion of how to use iterator protocol(s) correctly from other languages is outside the scope of this PEP.)</p>
<p>For applications, supporting pre-2.2 versions of Python is slightly more complex:</p>
<p>You may not return a file object and expect it to work as an iterable, since before Python 2.2, files were not iterable. (In general, you shouldn&#8217;t do this anyway, because it will peform quite poorly most of the time!) Use wsgi.file_wrapper or an application-specific file wrapper class. (See Optional Platform-Specific File Handling for more on wsgi.file_wrapper, and an example class you can use to wrap a file as an iterable.)<br />
If you return a custom iterable, it must implement the pre-2.2 iterator protocol. That is, provide a __getitem__ method that accepts an integer key, and raises IndexError when exhausted. (Note that built-in sequence types are also acceptable, since they also implement this protocol.)<br />
Finally, middleware that wishes to support pre-2.2 versions of Python, and iterates over application return values or itself returns an iterable (or both), must follow the appropriate recommendations above.</p>
<p>(Note: It should go without saying that to support pre-2.2 versions of Python, any server, gateway, application, or middleware must also use only language features available in the target version, use 1 and 0 instead of True and False, etc.)</p>
<p>Optional Platform-Specific File Handling</p>
<p>Some operating environments provide special high-performance file- transmission facilities, such as the Unix sendfile() call. Servers and gateways may expose this functionality via an optional wsgi.file_wrapper key in the environ. An application may use this “file wrapper” to convert a file or file-like object into an iterable that it then returns, e.g.:</p>
<p>if &#8216;wsgi.file_wrapper&#8217; in environ:<br />
    return environ['wsgi.file_wrapper'](filelike, block_size)<br />
else:<br />
    return iter(lambda: filelike.read(block_size), ”)<br />
If the server or gateway supplies wsgi.file_wrapper, it must be a callable that accepts one required positional parameter, and one optional positional parameter. The first parameter is the file-like object to be sent, and the second parameter is an optional block size “suggestion” (which the server/gateway need not use). The callable must return an iterable object, and must not perform any data transmission until and unless the server/gateway actually receives the iterable as a return value from the application. (To do otherwise would prevent middleware from being able to interpret or override the response data.)</p>
<p>To be considered “file-like”, the object supplied by the application must have a read() method that takes an optional size argument. It may have a close() method, and if so, the iterable returned by wsgi.file_wrapper must have a close() method that invokes the original file-like object&#8217;s close() method. If the “file-like” object has any other methods or attributes with names matching those of Python built-in file objects (e.g. fileno()), the wsgi.file_wrapper may assume that these methods or attributes have the same semantics as those of a built-in file object.</p>
<p>The actual implementation of any platform-specific file handling must occur after the application returns, and the server or gateway checks to see if a wrapper object was returned. (Again, because of the presence of middleware, error handlers, and the like, it is not guaranteed that any wrapper created will actually be used.)</p>
<p>Apart from the handling of close(), the semantics of returning a file wrapper from the application should be the same as if the application had returned iter(filelike.read, ”). In other words, transmission should begin at the current position within the “file” at the time that transmission begins, and continue until the end is reached.</p>
<p>Of course, platform-specific file transmission APIs don&#8217;t usually accept arbitrary “file-like” objects. Therefore, a wsgi.file_wrapper has to introspect the supplied object for things such as a fileno() (Unix-like OSes) or a java.nio.FileChannel (under Jython) in order to determine if the file-like object is suitable for use with the platform-specific API it supports.</p>
<p>Note that even if the object is not suitable for the platform API, the wsgi.file_wrapper must still return an iterable that wraps read() and close(), so that applications using file wrappers are portable across platforms. Here&#8217;s a simple platform-agnostic file wrapper class, suitable for old (pre 2.2) and new Pythons alike:</p>
<p>class FileWrapper:</p>
<p>    def __init__(self, filelike, blksize=8192):<br />
        self.filelike = filelike<br />
        self.blksize = blksize<br />
        if hasattr(filelike,&#8217;close&#8217;):<br />
            self.close = filelike.close</p>
<p>    def __getitem__(self,key):<br />
        data = self.filelike.read(self.blksize)<br />
        if data:<br />
            return data<br />
        raise IndexError<br />
and here is a snippet from a server/gateway that uses it to provide access to a platform-specific API:</p>
<p>environ['wsgi.file_wrapper'] = FileWrapper<br />
result = application(environ, start_response)</p>
<p>try:<br />
    if isinstance(result,FileWrapper):<br />
        # check if result.filelike is usable w/platform-specific<br />
        # API, and if so, use that API to transmit the result.<br />
        # If not, fall through to normal iterable handling<br />
        # loop below.</p>
<p>    for data in result:<br />
        # etc.</p>
<p>finally:<br />
    if hasattr(result,&#8217;close&#8217;):<br />
        result.close()<br />
Questions and Answers</p>
<p>Why must environ be a dictionary? What&#8217;s wrong with using a subclass?</p>
<p>The rationale for requiring a dictionary is to maximize portability between servers. The alternative would be to define some subset of a dictionary&#8217;s methods as being the standard and portable interface. In practice, however, most servers will probably find a dictionary adequate to their needs, and thus framework authors will come to expect the full set of dictionary features to be available, since they will be there more often than not. But, if some server chooses not to use a dictionary, then there will be interoperability problems despite that server&#8217;s “conformance” to spec. Therefore, making a dictionary mandatory simplifies the specification and guarantees interoperabilty.</p>
<p>Note that this does not prevent server or framework developers from offering specialized services as custom variables inside the environ dictionary. This is the recommended approach for offering any such value-added services.</p>
<p>Why can you call write() and yield strings/return an iterable? Shouldn&#8217;t we pick just one way?</p>
<p>If we supported only the iteration approach, then current frameworks that assume the availability of “push” suffer. But, if we only support pushing via write(), then server performance suffers for transmission of e.g. large files (if a worker thread can&#8217;t begin work on a new request until all of the output has been sent). Thus, this compromise allows an application framework to support both approaches, as appropriate, but with only a little more burden to the server implementor than a push-only approach would require.</p>
<p>What&#8217;s the close() for?</p>
<p>When writes are done during the execution of an application object, the application can ensure that resources are released using a try/finally block. But, if the application returns an iterable, any resources used will not be released until the iterable is garbage collected. The close() idiom allows an application to release critical resources at the end of a request, and it&#8217;s forward-compatible with the support for try/finally in generators that&#8217;s proposed by PEP 325.</p>
<p>Why is this interface so low-level? I want feature X! (e.g. cookies, sessions, persistence, &#8230;)</p>
<p>This isn&#8217;t Yet Another Python Web Framework. It&#8217;s just a way for frameworks to talk to web servers, and vice versa. If you want these features, you need to pick a web framework that provides the features you want. And if that framework lets you create a WSGI application, you should be able to run it in most WSGI-supporting servers. Also, some WSGI servers may offer additional services via objects provided in their environ dictionary; see the applicable server documentation for details. (Of course, applications that use such extensions will not be portable to other WSGI-based servers.)</p>
<p>Why use CGI variables instead of good old HTTP headers? And why mix them in with WSGI-defined variables?</p>
<p>Many existing web frameworks are built heavily upon the CGI spec, and existing web servers know how to generate CGI variables. In contrast, alternative ways of representing inbound HTTP information are fragmented and lack market share. Thus, using the CGI “standard” seems like a good way to leverage existing implementations. As for mixing them with WSGI variables, separating them would just require two dictionary arguments to be passed around, while providing no real benefits.</p>
<p>What about the status string? Can&#8217;t we just use the number, passing in 200 instead of “200 OK”?</p>
<p>Doing this would complicate the server or gateway, by requiring them to have a table of numeric statuses and corresponding messages. By contrast, it is easy for an application or framework author to type the extra text to go with the specific response code they are using, and existing frameworks often already have a table containing the needed messages. So, on balance it seems better to make the application/framework responsible, rather than the server or gateway.</p>
<p>Why is wsgi.run_once not guaranteed to run the app only once?</p>
<p>Because it&#8217;s merely a suggestion to the application that it should “rig for infrequent running”. This is intended for application frameworks that have multiple modes of operation for caching, sessions, and so forth. In a “multiple run” mode, such frameworks may preload caches, and may not write e.g. logs or session data to disk after each request. In “single run” mode, such frameworks avoid preloading and flush all necessary writes after each request.</p>
<p>However, in order to test an application or framework to verify correct operation in the latter mode, it may be necessary (or at least expedient) to invoke it more than once. Therefore, an application should not assume that it will definitely not be run again, just because it is called with wsgi.run_once set to True.</p>
<p>Feature X (dictionaries, callables, etc.) are ugly for use in application code; why don&#8217;t we use objects instead?</p>
<p>All of these implementation choices of WSGI are specifically intended to decouple features from one another; recombining these features into encapsulated objects makes it somewhat harder to write servers or gateways, and an order of magnitude harder to write middleware that replaces or modifies only small portions of the overall functionality.</p>
<p>In essence, middleware wants to have a “Chain of Responsibility” pattern, whereby it can act as a “handler” for some functions, while allowing others to remain unchanged. This is difficult to do with ordinary Python objects, if the interface is to remain extensible. For example, one must use __getattr__ or __getattribute__ overrides, to ensure that extensions (such as attributes defined by future WSGI versions) are passed through.</p>
<p>This type of code is notoriously difficult to get 100% correct, and few people will want to write it themselves. They will therefore copy other people&#8217;s implementations, but fail to update them when the person they copied from corrects yet another corner case.</p>
<p>Further, this necessary boilerplate would be pure excise, a developer tax paid by middleware developers to support a slightly prettier API for application framework developers. But, application framework developers will typically only be updating one framework to support WSGI, and in a very limited part of their framework as a whole. It will likely be their first (and maybe their only) WSGI implementation, and thus they will likely implement with this specification ready to hand. Thus, the effort of making the API “prettier” with object attributes and suchlike would likely be wasted for this audience.</p>
<p>We encourage those who want a prettier (or otherwise improved) WSGI interface for use in direct web application programming (as opposed to web framework development) to develop APIs or frameworks that wrap WSGI for convenient use by application developers. In this way, WSGI can remain conveniently low-level for server and middleware authors, while not being “ugly” for application developers.</p>
<p>Proposed/Under Discussion</p>
<p>These items are currently being discussed on the Web-SIG and elsewhere, or are on the PEP author&#8217;s “to-do” list:</p>
<p>Should wsgi.input be an iterator instead of a file? This would help for asynchronous applications and chunked-encoding input streams.<br />
Optional extensions are being discussed for pausing iteration of an application&#8217;s ouptut until input is available or until a callback occurs.<br />
Add a section about synchronous vs. asynchronous apps and servers, the relevant threading models, and issues/design goals in these areas.<br />
Acknowledgements</p>
<p>Thanks go to the many folks on the Web-SIG mailing list whose thoughtful feedback made this revised draft possible. Especially:</p>
<p>Gregory “Grisha” Trubetskoy, author of mod_python, who beat up on the first draft as not offering any advantages over “plain old CGI”, thus encouraging me to look for a better approach.<br />
Ian Bicking, who helped nag me into properly specifying the multithreading and multiprocess options, as well as badgering me to provide a mechanism for servers to supply custom extension data to an application.<br />
Tony Lownds, who came up with the concept of a start_response function that took the status and headers, returning a write function. His input also guided the design of the exception handling facilities, especially in the area of allowing for middleware that overrides application error messages.<br />
Alan Kennedy, whose courageous attempts to implement WSGI-on-Jython (well before the spec was finalized) helped to shape the “supporting older versions of Python” section, as well as the optional wsgi.file_wrapper facility.<br />
Mark Nottingham, who reviewed the spec extensively for issues with HTTP RFC compliance, especially with regard to HTTP/1.1 features that I didn&#8217;t even know existed until he pointed them out.<br />
References</p>
<p>[1]	The Python Wiki “Web Programming” topic (http://www.python.org/cgi-bin/moinmoin/WebProgramming)<br />
[2]	The Common Gateway Interface Specification, v 1.1, 3rd Draft (http://cgi-spec.golux.com/draft-coar-cgi-v11-03.txt)<br />
[3]	“Chunked Transfer Coding” &#8212; HTTP/1.1, section 3.6.1 (http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.6.1)<br />
[4]	“End-to-end and Hop-by-hop Headers” &#8212; HTTP/1.1, Section 13.5.1 (http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.5.1)<br />
[5]	mod_ssl Reference, “Environment Variables” (http://www.modssl.org/docs/2.8/ssl_reference.html#ToC25)<br />
Copyright</p>
<p>This document has been placed in the public domain.</p>
<p>Docutils System Messages</p>
<p>System Message: ERROR/3 (pep-0333_cn_temp.txt, line 259); backlink</p>
<p>Unknown target name: “the start_response() callable”.</p>
<table class="wumii-related-items" cellspacing="0" cellpadding="3" border="0"  style="clear: both;">
    
    <tr>
        <td colspan="5"><b><font size="-1"  style="display: block !important; padding: 20px 0 5px !important;">您可能也喜欢：</font></b></td>
    </tr>
    
        <tr>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important;">
                    <a target="_blank" title="超级简单Python Socket Server一例" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2010%2F08%2F05%2Fpython-socket-server-simple-example%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F11%2F23%2Fpython-web-server-gateway-interface-v1-0-1%2F">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2012/01/20/14185006.jpg" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">超级简单Python Socket Server一例</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="Python PIL 生成带阴影的缩略图" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2011%2F01%2F31%2Fpython-pil-shadow-thumb%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F11%2F23%2Fpython-web-server-gateway-interface-v1-0-1%2F">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2011/11/20/11136487.jpg" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">Python PIL 生成带阴影的缩略图</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="python非贪婪、多行匹配正则表达式例子" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2010%2F11%2F19%2Fpython-multi-line-non-greedy-regular-expression-sample%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F11%2F23%2Fpython-web-server-gateway-interface-v1-0-1%2F">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/images/blogWidget/wordpress_default.gif" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">python非贪婪、多行匹配正则表达式例子</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="Python Web.py框架实现的简易REST服务原型" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2011%2F02%2F15%2Fweb-py-restful-service%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F11%2F23%2Fpython-web-server-gateway-interface-v1-0-1%2F">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2012/01/20/14184900.jpg" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">Python Web.py框架实现的简易REST服务原型</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="Python 批量生成缩略图" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2011%2F01%2F31%2Fpython-thumb-image%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F11%2F23%2Fpython-web-server-gateway-interface-v1-0-1%2F">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/images/blogWidget/wordpress_default.gif" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">Python 批量生成缩略图</font>
                    </a>
                </td>
        </tr>
    
    <tr>
        <td colspan="5" align="right">
            <a style="text-decoration: none !important;" href="http://www.wumii.com/widget/relatedItems" target="_blank" title="无觅相关文章插件">
                <font size="-1" color="#bbbbbb" style="display: block !important; font-family: arial !important; padding: 5px 0 !important; font-size: 12px !important; color: #bbb !important;">无觅</font>
            </a>
        </td>
    </tr>
</table>]]></content:encoded>
			<wfw:commentRss>http://fendou.org/2011/11/23/python-web-server-gateway-interface-v1-0-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mysql中union和order by的问题及优先级</title>
		<link>http://fendou.org/2011/11/16/mysql-union-order-by-problem/</link>
		<comments>http://fendou.org/2011/11/16/mysql-union-order-by-problem/#comments</comments>
		<pubDate>Wed, 16 Nov 2011 07:03:10 +0000</pubDate>
		<dc:creator>崔玉松</dc:creator>
				<category><![CDATA[Database]]></category>
		<category><![CDATA[Excellence Article]]></category>
		<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://fendou.org/?p=854</guid>
		<description><![CDATA[在Mysql的参考手册中，并没有对union和order by的优先级进行说明 它建议的方法是，对SQL语句加上（），这样能使SQL的语义更清晰 例如，需要对union后的结果进行order by，则： &#40;SELECT a FROM tbl_name WHERE a=10 AND B=1&#41; UNION &#40;SELECT a FROM tbl_name WHERE a=11 AND B=2&#41; ORDER BY a LIMIT 10； 如果，需要对单个SQL语句进行order by，则应把order by子句放入圆括号中，如下： &#40;SELECT a FROM tbl_name WHERE a=10 AND B=1 ORDER BY a LIMIT 10&#41; UNION &#40;SELECT a FROM tbl_name WHERE a=11 AND B=2 ORDER BY a...  <a href="http://fendou.org/2011/11/16/mysql-union-order-by-problem/" class="more-link" title="Read Mysql中union和order by的问题及优先级">Read more &#187;</a><table class="wumii-related-items" cellspacing="0" cellpadding="2" border="0" width="100%" style="clear: both;">
    
    <tr>
        <td ><b><font size="-1"  style="display: block !important; padding: 20px 0 5px !important;">您可能也喜欢：</font></b></td>
    </tr>
    
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2009%2F02%2F26%2Fmysql-trigger-one%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F11%2F16%2Fmysql-union-order-by-problem%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">MySQL触发器介绍</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2010%2F05%2F28%2Fmysql-variable-params-comment%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F11%2F16%2Fmysql-union-order-by-problem%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">Mysql存储过程学习笔记--变量、参数、注释</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2011%2F08%2F30%2Fmysql%25E5%2586%2599%25E5%2585%25A5%25E4%25BC%2598%25E5%258C%2596%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F11%2F16%2Fmysql-union-order-by-problem%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">MySQL写入优化</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2010%2F05%2F28%2Fphp-mysql-procedure%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F11%2F16%2Fmysql-union-order-by-problem%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">PHP中操作mysql执行存储过程</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2010%2F03%2F07%2Fmysql-view%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F11%2F16%2Fmysql-union-order-by-problem%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">MySQL视图介绍</font>
                    </a>
                </td>
            </tr>
    
    <tr>
        <td  align="right">
            <a style="text-decoration: none !important;" href="http://www.wumii.com/widget/relatedItems" target="_blank" title="无觅相关文章插件">
                <font size="-1" color="#bbbbbb" style="display: block !important; font-family: arial !important; padding: 5px 0 !important; font-size: 12px !important; color: #bbb !important;">无觅</font>
            </a>
        </td>
    </tr>
</table>]]></description>
			<content:encoded><![CDATA[<p> 在Mysql的参考手册中，并没有对union和order by的优先级进行说明<br />
它建议的方法是，对SQL语句加上（），这样能使SQL的语义更清晰<br />
例如，需要对union后的结果进行order by，则：</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">SELECT</span> a <span style="color: #993333; font-weight: bold;">FROM</span> tbl_name <span style="color: #993333; font-weight: bold;">WHERE</span> a<span style="color: #66cc66;">=</span><span style="color: #cc66cc;">10</span> <span style="color: #993333; font-weight: bold;">AND</span> B<span style="color: #66cc66;">=</span><span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#41;</span>  
<span style="color: #993333; font-weight: bold;">UNION</span>  
<span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">SELECT</span> a <span style="color: #993333; font-weight: bold;">FROM</span> tbl_name <span style="color: #993333; font-weight: bold;">WHERE</span> a<span style="color: #66cc66;">=</span><span style="color: #cc66cc;">11</span> <span style="color: #993333; font-weight: bold;">AND</span> B<span style="color: #66cc66;">=</span><span style="color: #cc66cc;">2</span><span style="color: #66cc66;">&#41;</span>  
<span style="color: #993333; font-weight: bold;">ORDER</span> <span style="color: #993333; font-weight: bold;">BY</span> a <span style="color: #993333; font-weight: bold;">LIMIT</span> <span style="color: #cc66cc;">10</span>；</pre></div></div>

<p>如果，需要对单个SQL语句进行order by，则应把order by子句放入圆括号中，如下：</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">SELECT</span> a <span style="color: #993333; font-weight: bold;">FROM</span> tbl_name <span style="color: #993333; font-weight: bold;">WHERE</span> a<span style="color: #66cc66;">=</span><span style="color: #cc66cc;">10</span> <span style="color: #993333; font-weight: bold;">AND</span> B<span style="color: #66cc66;">=</span><span style="color: #cc66cc;">1</span> <span style="color: #993333; font-weight: bold;">ORDER</span> <span style="color: #993333; font-weight: bold;">BY</span> a <span style="color: #993333; font-weight: bold;">LIMIT</span> <span style="color: #cc66cc;">10</span><span style="color: #66cc66;">&#41;</span>  
<span style="color: #993333; font-weight: bold;">UNION</span>  
<span style="color: #66cc66;">&#40;</span><span style="color: #993333; font-weight: bold;">SELECT</span> a <span style="color: #993333; font-weight: bold;">FROM</span> tbl_name <span style="color: #993333; font-weight: bold;">WHERE</span> a<span style="color: #66cc66;">=</span><span style="color: #cc66cc;">11</span> <span style="color: #993333; font-weight: bold;">AND</span> B<span style="color: #66cc66;">=</span><span style="color: #cc66cc;">2</span> <span style="color: #993333; font-weight: bold;">ORDER</span> <span style="color: #993333; font-weight: bold;">BY</span> a <span style="color: #993333; font-weight: bold;">LIMIT</span> <span style="color: #cc66cc;">10</span><span style="color: #66cc66;">&#41;</span>;</pre></div></div>

<p>另外注意：圆括号中用于单个SQL语句的ORDER BY只有当与LIMIT结合后，才起作用。否则，ORDER BY被优化去除。<br />
我在innodb引擎的2个表上测试了没有加()，进行union和order by的操作，如下：</p>

<div class="wp_syntax"><div class="code"><pre class="sql" style="font-family:monospace;"><span style="color: #993333; font-weight: bold;">SELECT</span> a <span style="color: #993333; font-weight: bold;">FROM</span> tbl_name <span style="color: #993333; font-weight: bold;">WHERE</span> a<span style="color: #66cc66;">=</span><span style="color: #cc66cc;">10</span> <span style="color: #993333; font-weight: bold;">AND</span> B<span style="color: #66cc66;">=</span><span style="color: #cc66cc;">1</span>  
<span style="color: #993333; font-weight: bold;">UNION</span>  
<span style="color: #993333; font-weight: bold;">SELECT</span> a <span style="color: #993333; font-weight: bold;">FROM</span> tbl_name <span style="color: #993333; font-weight: bold;">WHERE</span> a<span style="color: #66cc66;">=</span><span style="color: #cc66cc;">11</span> <span style="color: #993333; font-weight: bold;">AND</span> B<span style="color: #66cc66;">=</span><span style="color: #cc66cc;">2</span>  
<span style="color: #993333; font-weight: bold;">ORDER</span> <span style="color: #993333; font-weight: bold;">BY</span> a <span style="color: #993333; font-weight: bold;">LIMIT</span> <span style="color: #cc66cc;">10</span>;</pre></div></div>

<p>发现，它默认的结果也是先进行union，然后再order by，和第一种情况执行结果相同<br />
不过，为了逻辑清晰，最好还是加上对应的()比较好<br />
另外：Mysql中union可以有union，union distinct，union all这3中形式<br />
union和union distinct会对union后的结果进行排重，保证所有返回的行都是唯一的<br />
union all则会返回所有SELECT语句中得到所有匹配的行</p>
<table class="wumii-related-items" cellspacing="0" cellpadding="2" border="0" width="100%" style="clear: both;">
    
    <tr>
        <td ><b><font size="-1"  style="display: block !important; padding: 20px 0 5px !important;">您可能也喜欢：</font></b></td>
    </tr>
    
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2009%2F02%2F26%2Fmysql-trigger-one%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F11%2F16%2Fmysql-union-order-by-problem%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">MySQL触发器介绍</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2010%2F05%2F28%2Fmysql-variable-params-comment%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F11%2F16%2Fmysql-union-order-by-problem%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">Mysql存储过程学习笔记--变量、参数、注释</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2011%2F08%2F30%2Fmysql%25E5%2586%2599%25E5%2585%25A5%25E4%25BC%2598%25E5%258C%2596%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F11%2F16%2Fmysql-union-order-by-problem%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">MySQL写入优化</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2010%2F05%2F28%2Fphp-mysql-procedure%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F11%2F16%2Fmysql-union-order-by-problem%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">PHP中操作mysql执行存储过程</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2010%2F03%2F07%2Fmysql-view%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F11%2F16%2Fmysql-union-order-by-problem%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">MySQL视图介绍</font>
                    </a>
                </td>
            </tr>
    
    <tr>
        <td  align="right">
            <a style="text-decoration: none !important;" href="http://www.wumii.com/widget/relatedItems" target="_blank" title="无觅相关文章插件">
                <font size="-1" color="#bbbbbb" style="display: block !important; font-family: arial !important; padding: 5px 0 !important; font-size: 12px !important; color: #bbb !important;">无觅</font>
            </a>
        </td>
    </tr>
</table>]]></content:encoded>
			<wfw:commentRss>http://fendou.org/2011/11/16/mysql-union-order-by-problem/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>使用supervisor和nginx发布tornado程序</title>
		<link>http://fendou.org/2011/09/23/supervisor-nginx-tornado/</link>
		<comments>http://fendou.org/2011/09/23/supervisor-nginx-tornado/#comments</comments>
		<pubDate>Fri, 23 Sep 2011 02:16:43 +0000</pubDate>
		<dc:creator>崔玉松</dc:creator>
				<category><![CDATA[Excellence Article]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://fendou.org/?p=850</guid>
		<description><![CDATA[tornado先天对异步(no-bolocking)处理能力，非常适合作为Web服务。tornado在linux平台使用epoll来实现异步事件的处理，性能非常好。但是python做为一个脚步语言，单进程执行，无法利用多CPU，对当今的多核CPU是一个很大的浪费。为提高性能，提高CPU利用率，一般会将tornado程序允许cup*n个。 怎样才能放便启动多个tornado程序呢，我们可以用supervisor来管理多个tornado应用。supervisor安装非常方便，easy_install supervisord就可以。 以下是supervisor的配置，我在一台服务器上配置了四个tornado服务。 config ; supervisor. [group:gisapp] programs=gis-8001,gis-8002,gis-8003,gis-8004 [program:gis-8001] command=python /home/gis/gis/gisserver.py &#8211;port=8001 directory=/home/gis/gis/ autorestart=true redirect_stderr=true stdout_logfile=/home/gis/gis/logs/gis_server-8001.log stdout_logfile_maxbytes=500MB stdout_logfile_backups=50 stdout_capture_maxbytes=1MB stdout_events_enabled=false loglevel=warn [program:gis-8002] command=python /home/gis/gis/gisserver.py &#8211;port=8002 directory=/home/gis/gis/ autorestart=true redirect_stderr=true stdout_logfile=/home/gis/gis/gis_server-8002.log stdout_logfile_maxbytes=500MB stdout_logfile_backups=50 stdout_capture_maxbytes=1MB stdout_events_enabled=false loglevel=warn [program:gis-8003] command=python /home/gis/gis/gisserver.py &#8211;port=8003 directory=/home/gis/gis/ autorestart=true redirect_stderr=true stdout_logfile=/home/gis/gis/gis_server-8003.log stdout_logfile_maxbytes=500MB stdout_logfile_backups=50 stdout_capture_maxbytes=1MB stdout_events_enabled=false loglevel=warn [program:gis-8004] command=python /home/gis/gis/gisserver.py &#8211;port=8004 directory=/home/gis/gis/ autorestart=true redirect_stderr=true...  <a href="http://fendou.org/2011/09/23/supervisor-nginx-tornado/" class="more-link" title="Read 使用supervisor和nginx发布tornado程序">Read more &#187;</a><table class="wumii-related-items" cellspacing="0" cellpadding="3" border="0"  style="clear: both;">
    
    <tr>
        <td colspan="5"><b><font size="-1"  style="display: block !important; padding: 20px 0 5px !important;">您可能也喜欢：</font></b></td>
    </tr>
    
        <tr>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important;">
                    <a target="_blank" title="Python PIL 生成带阴影的缩略图" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2011%2F01%2F31%2Fpython-pil-shadow-thumb%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F09%2F23%2Fsupervisor-nginx-tornado%2F">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2011/11/20/11136487.jpg" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">Python PIL 生成带阴影的缩略图</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="Httpsqs Python Client" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2011%2F01%2F25%2Fhttpsqs-python-client%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F09%2F23%2Fsupervisor-nginx-tornado%2F">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/images/blogWidget/wordpress_default.gif" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">Httpsqs Python Client</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="Python Queue 的多线程(multi thread)死锁问题" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2011%2F01%2F25%2Fpython-queue-lock%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F09%2F23%2Fsupervisor-nginx-tornado%2F">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/images/blogWidget/wordpress_default.gif" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">Python Queue 的多线程(multi thread)死锁问题</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="超级简单Python Socket Server一例" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2010%2F08%2F05%2Fpython-socket-server-simple-example%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F09%2F23%2Fsupervisor-nginx-tornado%2F">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2012/01/20/14185006.jpg" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">超级简单Python Socket Server一例</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="python满足你需要的50个模块" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2011%2F01%2F12%2F50-python-modules%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F09%2F23%2Fsupervisor-nginx-tornado%2F">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/images/blogWidget/wordpress_default.gif" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">python满足你需要的50个模块</font>
                    </a>
                </td>
        </tr>
    
    <tr>
        <td colspan="5" align="right">
            <a style="text-decoration: none !important;" href="http://www.wumii.com/widget/relatedItems" target="_blank" title="无觅相关文章插件">
                <font size="-1" color="#bbbbbb" style="display: block !important; font-family: arial !important; padding: 5px 0 !important; font-size: 12px !important; color: #bbb !important;">无觅</font>
            </a>
        </td>
    </tr>
</table>]]></description>
			<content:encoded><![CDATA[<p>tornado先天对异步(no-bolocking)处理能力，非常适合作为Web服务。tornado在linux平台使用epoll来实现异步事件的处理，性能非常好。但是python做为一个脚步语言，单进程执行，无法利用多CPU，对当今的多核CPU是一个很大的浪费。为提高性能，提高CPU利用率，一般会将tornado程序允许cup*n个。<br />
怎样才能放便启动多个tornado程序呢，我们可以用supervisor来管理多个tornado应用。supervisor安装非常方便，easy_install supervisord就可以。<br />
以下是supervisor的配置，我在一台服务器上配置了四个tornado服务。</p>
<p>config</p>
<p>; supervisor.<br />
[group:gisapp]<br />
programs=gis-8001,gis-8002,gis-8003,gis-8004</p>
<p>[program:gis-8001]<br />
command=python /home/gis/gis/gisserver.py &#8211;port=8001<br />
directory=/home/gis/gis/<br />
autorestart=true<br />
redirect_stderr=true<br />
stdout_logfile=/home/gis/gis/logs/gis_server-8001.log<br />
stdout_logfile_maxbytes=500MB<br />
stdout_logfile_backups=50<br />
stdout_capture_maxbytes=1MB<br />
stdout_events_enabled=false<br />
loglevel=warn</p>
<p>[program:gis-8002]<br />
command=python /home/gis/gis/gisserver.py &#8211;port=8002<br />
directory=/home/gis/gis/<br />
autorestart=true<br />
redirect_stderr=true<br />
stdout_logfile=/home/gis/gis/gis_server-8002.log<br />
stdout_logfile_maxbytes=500MB<br />
stdout_logfile_backups=50<br />
stdout_capture_maxbytes=1MB<br />
stdout_events_enabled=false<br />
loglevel=warn<br />
[program:gis-8003]<br />
command=python /home/gis/gis/gisserver.py &#8211;port=8003<br />
directory=/home/gis/gis/<br />
autorestart=true<br />
redirect_stderr=true<br />
stdout_logfile=/home/gis/gis/gis_server-8003.log<br />
stdout_logfile_maxbytes=500MB<br />
stdout_logfile_backups=50<br />
stdout_capture_maxbytes=1MB<br />
stdout_events_enabled=false<br />
loglevel=warn<br />
[program:gis-8004]<br />
command=python /home/gis/gis/gisserver.py &#8211;port=8004<br />
directory=/home/gis/gis/<br />
autorestart=true<br />
redirect_stderr=true<br />
stdout_logfile=/home/gis/gis/gis_server-8004.log<br />
stdout_logfile_maxbytes=500MB<br />
stdout_logfile_backups=50<br />
stdout_capture_maxbytes=1MB<br />
stdout_events_enabled=false<br />
loglevel=warn<br />
怎么让四个端口同时提供服务呢？可以使用web服务神器nginx,nginx自带了负载平衡功能，<br />
可以让这4个服务同时提供服务。</p>
<p>nginx config</p>
<p>    upstream gisserver{<br />
            server 127.0.0.1:8001;<br />
            server 127.0.0.1:8002;<br />
            server 127.0.0.1:8003;<br />
            server 127.0.0.1:8004;<br />
    }<br />
location /tile/ {<br />
    proxy_pass        http://gisserver;<br />
    proxy_set_header  X-Real-IP  $remote_addr;<br />
    proxy_pass_header Set-Cookie;<br />
}<br />
更新：</p>
<p>上述配置可以精简，supervisord配置可以使用变量表示</p>
<p>; supervisor.<br />
[group:gisapp]<br />
programs=gis-web</p>
<p>[program:gis-web]<br />
command=python /home/gis/gis/gisserver.py &#8211;port=80%(process_num)02d<br />
directory=/home/gis/gis/<br />
autorestart=true<br />
redirect_stderr=true<br />
stdout_logfile=/home/gis/gis/logs/gis_server-80%(process_num)02d.log<br />
stdout_logfile_maxbytes=500MB<br />
stdout_logfile_backups=50<br />
stdout_capture_maxbytes=1MB<br />
stdout_events_enabled=false<br />
loglevel=warn<br />
numprocs-4<br />
numprocs_start=1<br />
对不同的服务器，之需要调整numprocs值就可以。</p>
<table class="wumii-related-items" cellspacing="0" cellpadding="3" border="0"  style="clear: both;">
    
    <tr>
        <td colspan="5"><b><font size="-1"  style="display: block !important; padding: 20px 0 5px !important;">您可能也喜欢：</font></b></td>
    </tr>
    
        <tr>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important;">
                    <a target="_blank" title="Python PIL 生成带阴影的缩略图" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2011%2F01%2F31%2Fpython-pil-shadow-thumb%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F09%2F23%2Fsupervisor-nginx-tornado%2F">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2011/11/20/11136487.jpg" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">Python PIL 生成带阴影的缩略图</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="Httpsqs Python Client" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2011%2F01%2F25%2Fhttpsqs-python-client%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F09%2F23%2Fsupervisor-nginx-tornado%2F">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/images/blogWidget/wordpress_default.gif" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">Httpsqs Python Client</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="Python Queue 的多线程(multi thread)死锁问题" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2011%2F01%2F25%2Fpython-queue-lock%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F09%2F23%2Fsupervisor-nginx-tornado%2F">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/images/blogWidget/wordpress_default.gif" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">Python Queue 的多线程(multi thread)死锁问题</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="超级简单Python Socket Server一例" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2010%2F08%2F05%2Fpython-socket-server-simple-example%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F09%2F23%2Fsupervisor-nginx-tornado%2F">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2012/01/20/14185006.jpg" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">超级简单Python Socket Server一例</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="python满足你需要的50个模块" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2011%2F01%2F12%2F50-python-modules%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F09%2F23%2Fsupervisor-nginx-tornado%2F">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/images/blogWidget/wordpress_default.gif" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">python满足你需要的50个模块</font>
                    </a>
                </td>
        </tr>
    
    <tr>
        <td colspan="5" align="right">
            <a style="text-decoration: none !important;" href="http://www.wumii.com/widget/relatedItems" target="_blank" title="无觅相关文章插件">
                <font size="-1" color="#bbbbbb" style="display: block !important; font-family: arial !important; padding: 5px 0 !important; font-size: 12px !important; color: #bbb !important;">无觅</font>
            </a>
        </td>
    </tr>
</table>]]></content:encoded>
			<wfw:commentRss>http://fendou.org/2011/09/23/supervisor-nginx-tornado/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>淘宝在数据处理领域的项目及开源产品介绍</title>
		<link>http://fendou.org/2011/09/17/taobao-project/</link>
		<comments>http://fendou.org/2011/09/17/taobao-project/#comments</comments>
		<pubDate>Sat, 17 Sep 2011 01:11:19 +0000</pubDate>
		<dc:creator>崔玉松</dc:creator>
				<category><![CDATA[Tools]]></category>
		<category><![CDATA[应用架构]]></category>
		<category><![CDATA[算法]]></category>
		<category><![CDATA[编程技术]]></category>

		<guid isPermaLink="false">http://fendou.org/?p=847</guid>
		<description><![CDATA[淘宝在数据存储和处理领域在国内互联网公司中一直保持比较靠前的位置，而且由于电子商务领域独特的应用场景，淘宝在数据实时性和大规模计算及挖掘方面一直在国内保持着领先，因此积累了很多的实践的经验和产品。 &#160; TimeTunnel  基于Hbase打造的消息中间件，具有高可靠、消息顺序、事务等传统特性，还能按时间维度反复订阅最近历史的任意数据 高性能的broker，单节点达2万TPS，实际支持上千长链接并发 承载海量的数据传输，日同步数据达10TB，并且包含淘宝主营收入等关键性数据 在各IDC内，部署了超过2000个客户端，覆盖全网日志传输 Scribe、flume、activemq、ZeroMQ？我们可以做得更强大 TBFS 基于Hdfs 0.20进行全面改造，设计目标：单个集群可达10000台服务器，支持10亿文件、100PB的数据的存储 领先于社区的全新设计，彻底解决namenode单点问题，并可实现集群在线升级 期待你来挑战：snapshot、异地数据复制、多级的cache、软硬链接支持 Hbase 基于Hbase0.90.3进行改造，目前有上百台的Hbase服务器，支淘宝7个online应用，online数据存储达100T 支持本地化数据计算、二级引索 期待你来挑战：无阻塞的compact、更多的事务支持、更短的请求响应时间、更强大的索引（Lucene for hbase） Mapreduce 基于Hadoop0.19改造，最大单个集群规模达2000台服务器，兼容hadoop0.20 绝大多数API 实际存储数据超过10PB，日运行mapreduce job达5万个 期待你来挑战：更高效任务调度、更优雅的计算资源管理、更灵活的分布计算模型 Hive 基于hive0.6改造，修改的patch达上百个，支持SQL中间结果复用等众多特性 支持淘宝几乎所有的商业数据分析任务，是各行业数据分析师和数据开发工程师必备的技能 期待你来挑战：Hive &#38; Pig能混合编程？现在不能，你敢想就可以来做！ Taobao-pamirs-schedule  taobao-pamirs- schedule是一个基于分布式环境的多线程任务处理框架。目的是让一种批量任务或者不断变化的任务，能够被动态的分配到多个主机的JVM，不同的线程组中并执行。所有的任务能够被不重复，不遗漏的快速处理。它将需要执行的任务抽象成一致的任务模型，进行统一的管理和监控。运用schedule，任务能够比较均匀的分发到多台机器上进行处理，并且可以动态的进行水平扩展。 QLExpress  一个轻量级的脚本引擎，作为一个嵌入式规则引擎在业务系统中使用。让业务规则定义简便而不失灵活。让业务人员就可以定义业务规则。 支持标准的JAVA语法，还可以支持自定义操作符号、操作符号重载、函数定义、宏定义、数据延迟加载等。 UIC Uic是个海量数据的高稳定高并发高响应高可靠高一致性的系统。海量数据：现在整个用户中心的注册用户数接近6亿，加上地址，支付宝绑定数据，接近20亿。现在通过分库分表存在了16个库1024张表里面。高稳定,高可靠：用户中心是淘宝最为核心的系统之一，一个完整的交易流程需要访问UIC高达几十次，所以UIC的稳定是整个淘宝的重中之重，我们为了UIC的稳定做了很多容灾的方案，包括多机房的备份，缓存的容灾，mysql的容灾，流量的控制等等，可以说UIC的核心就是各种容灾体系和在各种极端情况的下解决措施高并发,高响应：每天访问UIC的数据在200亿左右，我们使用了tair做为缓存，使用protobuf序列化， 尽可能的提高缓存的命中率，现在用户数据的命中率在99%。 Prom  海量数据实时计算框架。基于搜索技术对海量明细数据做实时计算。目前主要对交易数据做分析，应用于数据魔方中 特点： 多维索引组合查询 支持任意维度的计算 实时响应(秒级) 结果精确 Andes  Andes是基于HBase的任意数据长时间维度高性能数据查询集群系统。解放数据魔方在查询时间段上的限制。 采用key-list存储方式，对于任何时间长度的查询均仅需一次数据库访问即可完成，规避查询时间对于查询性能的影响。 KeyKeys  用户搜索query数据分析系统。应用于淘词中，提供实时匹配用户输入query做关键query、关键热词的查询计算。 Myfox/Nodefox  MyFOX是一个针对海量统计数据设计的高性能分布式MySQL集群中间层，承担着数据魔方90%以上的数据存储和查询需求。MyFOX能够提供： •...  <a href="http://fendou.org/2011/09/17/taobao-project/" class="more-link" title="Read 淘宝在数据处理领域的项目及开源产品介绍">Read more &#187;</a><table class="wumii-related-items" cellspacing="0" cellpadding="2" border="0" width="100%" style="clear: both;">
    
    <tr>
        <td ><b><font size="-1"  style="display: block !important; padding: 20px 0 5px !important;">您可能也喜欢：</font></b></td>
    </tr>
    
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2009%2F10%2F31%2Fdata-source-gateway-one%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F09%2F17%2Ftaobao-project%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">数据源架构模式笔记(一)</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2010%2F01%2F13%2Fdesireable-characteristics-design%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F09%2F17%2Ftaobao-project%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">软件构建中的理想设计特征</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2012%2F01%2F17%2Ftf-idf%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F09%2F17%2Ftaobao-project%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">TF-IDF及文本相似性度量</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2009%2F10%2F09%2Fpatterns-of-enterprise-application-architecture-notes-one%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F09%2F17%2Ftaobao-project%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">《企业应用架构模式》笔记一</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2009%2F11%2F16%2F%25E4%25BC%259A%25E8%25AF%259D%25E7%258A%25B6%25E6%2580%2581%25E6%25A8%25A1%25E5%25BC%258F%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F09%2F17%2Ftaobao-project%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">会话状态模式</font>
                    </a>
                </td>
            </tr>
    
    <tr>
        <td  align="right">
            <a style="text-decoration: none !important;" href="http://www.wumii.com/widget/relatedItems" target="_blank" title="无觅相关文章插件">
                <font size="-1" color="#bbbbbb" style="display: block !important; font-family: arial !important; padding: 5px 0 !important; font-size: 12px !important; color: #bbb !important;">无觅</font>
            </a>
        </td>
    </tr>
</table>]]></description>
			<content:encoded><![CDATA[<p>淘宝在数据存储和处理领域在国内互联网公司中一直保持比较靠前的位置，而且由于电子商务领域独特的应用场景，淘宝在数据实时性和大规模计算及挖掘方面一直在国内保持着领先，因此积累了很多的实践的经验和产品。</p>
<p>&nbsp;</p>
<section>
<h3 id="section-0"><a href="http://code.taobao.org/project/view/411/" target="open_source">TimeTunnel <img title="os" src="http://www.tbdata.org/wp-content/uploads/2011/09/os.gif" alt="" width="45" height="19" /></a></h3>
<ol>
<li>基于Hbase打造的消息中间件，具有高可靠、消息顺序、事务等传统特性，还能按时间维度反复订阅最近历史的任意数据</li>
<li>高性能的broker，单节点达2万TPS，实际支持上千长链接并发</li>
<li>承载海量的数据传输，日同步数据达10TB，并且包含淘宝主营收入等关键性数据</li>
<li>在各IDC内，部署了超过2000个客户端，覆盖全网日志传输</li>
<li>Scribe、flume、activemq、ZeroMQ？我们可以做得更强大</li>
</ol>
</section>
<section>
<h3 id="section-1">TBFS</h3>
<ol>
<li>基于Hdfs 0.20进行全面改造，设计目标：单个集群可达10000台服务器，支持10亿文件、100PB的数据的存储</li>
<li>领先于社区的全新设计，彻底解决namenode单点问题，并可实现集群在线升级</li>
<li>期待你来挑战：snapshot、异地数据复制、多级的cache、软硬链接支持</li>
</ol>
</section>
<section>
<h3 id="section-2">Hbase</h3>
<ol>
<li>基于Hbase0.90.3进行改造，目前有上百台的Hbase服务器，支淘宝7个online应用，online数据存储达100T</li>
<li>支持本地化数据计算、二级引索</li>
<li>期待你来挑战：无阻塞的compact、更多的事务支持、更短的请求响应时间、更强大的索引（Lucene for hbase）</li>
</ol>
</section>
<section>
<h3 id="section-3">Mapreduce</h3>
<ol>
<li>基于Hadoop0.19改造，最大单个集群规模达2000台服务器，兼容hadoop0.20 绝大多数API</li>
<li>实际存储数据超过10PB，日运行mapreduce job达5万个</li>
<li>期待你来挑战：更高效任务调度、更优雅的计算资源管理、更灵活的分布计算模型</li>
</ol>
</section>
<section>
<h3 id="section-4">Hive</h3>
<ol>
<li>基于hive0.6改造，修改的patch达上百个，支持SQL中间结果复用等众多特性</li>
<li>支持淘宝几乎所有的商业数据分析任务，是各行业数据分析师和数据开发工程师必备的技能</li>
<li>期待你来挑战：Hive &amp; Pig能混合编程？现在不能，你敢想就可以来做！</li>
</ol>
</section>
<section>
<h3 id="section-5"><a href="http://code.taobao.org/trac/taobao-pamirs-schedule/wiki/ZhWikiStart" target="open_source">Taobao-pamirs-schedule <img title="os" src="http://www.tbdata.org/wp-content/uploads/2011/09/os.gif" alt="" width="45" height="19" /></a></h3>
<p>taobao-pamirs- schedule是一个基于分布式环境的多线程任务处理框架。目的是让一种批量任务或者不断变化的任务，能够被动态的分配到多个主机的JVM，不同的线程组中并执行。所有的任务能够被不重复，不遗漏的快速处理。它将需要执行的任务抽象成一致的任务模型，进行统一的管理和监控。运用schedule，任务能够比较均匀的分发到多台机器上进行处理，并且可以动态的进行水平扩展。</section>
<section>
<h3 id="section-6"><a href="http://code.taobao.org/trac/QLExpress/wiki/ZhWikiStart" target="open_source">QLExpress <img title="os" src="http://www.tbdata.org/wp-content/uploads/2011/09/os.gif" alt="" width="45" height="19" /></a></h3>
<p>一个轻量级的脚本引擎，作为一个嵌入式规则引擎在业务系统中使用。让业务规则定义简便而不失灵活。让业务人员就可以定义业务规则。 支持标准的JAVA语法，还可以支持自定义操作符号、操作符号重载、函数定义、宏定义、数据延迟加载等。</section>
<section>
<h3 id="section-7">UIC</h3>
<p>Uic是个海量数据的高稳定高并发高响应高可靠高一致性的系统。海量数据：现在整个用户中心的注册用户数接近6亿，加上地址，支付宝绑定数据，接近20亿。现在通过分库分表存在了16个库1024张表里面。高稳定,高可靠：用户中心是淘宝最为核心的系统之一，一个完整的交易流程需要访问UIC高达几十次，所以UIC的稳定是整个淘宝的重中之重，我们为了UIC的稳定做了很多容灾的方案，包括多机房的备份，缓存的容灾，mysql的容灾，流量的控制等等，可以说UIC的核心就是各种容灾体系和在各种极端情况的下解决措施高并发,高响应：每天访问UIC的数据在200亿左右，我们使用了tair做为缓存，使用protobuf序列化， 尽可能的提高缓存的命中率，现在用户数据的命中率在99%。</section>
<section>
<h3 id="section-8">Prom <img title="to-os" src="http://www.tbdata.org/wp-content/uploads/2011/09/to-os.gif" alt="" width="57" height="19" /></h3>
<p>海量数据实时计算框架。基于搜索技术对海量明细数据做实时计算。目前主要对交易数据做分析，应用于数据魔方中 特点：</p>
<ol>
<li>多维索引组合查询</li>
<li>支持任意维度的计算</li>
<li>实时响应(秒级)</li>
<li>结果精确</li>
</ol>
</section>
<section>
<h3 id="section-9">Andes <img title="to-os" src="http://www.tbdata.org/wp-content/uploads/2011/09/to-os.gif" alt="" width="57" height="19" /></h3>
<p>Andes是基于HBase的任意数据长时间维度高性能数据查询集群系统。解放数据魔方在查询时间段上的限制。 采用key-list存储方式，对于任何时间长度的查询均仅需一次数据库访问即可完成，规避查询时间对于查询性能的影响。</section>
<section>
<h3 id="section-10">KeyKeys <img title="to-os" src="http://www.tbdata.org/wp-content/uploads/2011/09/to-os.gif" alt="" width="57" height="19" /></h3>
<p>用户搜索query数据分析系统。应用于淘词中，提供实时匹配用户输入query做关键query、关键热词的查询计算。</section>
<section>
<h3 id="section-11">Myfox/Nodefox <img title="to-os" src="http://www.tbdata.org/wp-content/uploads/2011/09/to-os.gif" alt="" width="57" height="19" /></h3>
<p>MyFOX是一个针对海量统计数据设计的高性能分布式MySQL集群中间层，承担着数据魔方90%以上的数据存储和查询需求。MyFOX能够提供： • 1) “表字段+数据行数”相组合的数据切分规则； • 2) 完全透明的标准SQL查询接口 • 3) 同样的SQL语句，在10亿数据量下，与1000万数据量时完全相同的查询性能 • 4) 每份数据跨机房冗余，单机故障时受影响分片在集群内迅速自我复制； • 5) 冷热数据分离；实时监控查询频繁的数据分片，必要时在集群内进行扩充性的自动复制。</section>
<section>
<h3 id="section-12">Glider <img title="to-os" src="http://www.tbdata.org/wp-content/uploads/2011/09/to-os.gif" alt="" width="57" height="19" /></h3>
<p>Glider是建立在MyFOX、Prom以及Keykeys等异构数据源之上的统一的数据中间层，是数据魔方、淘宝指数以及开放API等数据产品的统一的数据查询出口。 Glider对各个异构数据源进行高度抽象，并在此基础上进行通用的JOIN、UNION、排序、去重、表达式求值等计算。这一复杂过程仅通过简单的配置即可实现。 Glider目前承担着单机每天超过2000万的数据查询请求，8月25日的平均响应时间126毫秒。</section>
<section>
<h3 id="section-13">Node.js</h3>
<p>Node.js是一个基于V8引擎的服务器端JavaScript运行环境，提供非阻塞、事件驱动、异步等特性。对于高负载应用服务场景，以及最大化利用服务器硬件资源具有很大实用价值。我们是国内最早将Node.js引入互联网商业开发领域的团队，已应用此技术开发了taojob（http://taojob.tbdata.org）、数据魔方俱乐部等一系列Web产品。目前正在使用Node.js对Myfox、Glider进行升级改造，并将其应用于“淘宝指数”的产品开发。</section>
<section>
<h3 id="section-14">数据可视化</h3>
<p>数据可视化是关于使用图形化的手段，清晰有效地传达与沟通信息的研究。淘宝数据可视化实验室自2010年成立以来，使用最新的数据可视化技术对淘宝海量商业数据进行研究分析，通过一系列可视化应用向外界展示淘宝数据蕴藏的价值，展现数据之美，为用户提供了了解数据、分析数据的全新的方式。</section>
<section>
<h3 id="section-15">体感交互</h3>
<p>通过对最新体感交互技术的研究，我们将在数据可视化及数据产品中为用户提供革命性的交互体验，帮助用户更方便的使用互联网数据产品。</section>
<section>
<h3 id="section-16">分布式推荐系统</h3>
<p>基于HADOOP-MAHOUT分布式机器学习技术、面向个性化主题的数据处理平台（PDP）核心应用之一；应用架构为offline计算+online两层推荐引擎；数据分为采集中心、算法中心、发布中心、评价中心。 淘宝网消费者购物模式挖掘 淘宝网消费者购物模式挖掘是淘宝指数项目中的一个子项目，通过分析消费者历史购物行为，挖掘和识别消费的购物模式和购物心理。项目中我们采用购物类目关联图分析的手段，通过图论技术来实现消费者相似购物模式的挖掘。其中类目相似连接图构造，相似购物团簇发掘是其中的核心点。</section>
<section>
<h3 id="section-17">商品评论情感分析打分</h3>
<p>商品评论情感分析打分系统是基于淘宝网海量商品评论数据，采用关联规则挖掘的方法，构建高频特征词；通过语义分析、消费者情感趋向分析，并结合评论者本人的评论习惯（评论者得分）给出商品评论分，以此得出该商品的最终评论打分。该得分反映商品对于购买者的满意程度。</section>
<section>
<h3 id="section-18">银河流数据处理平台</h3>
<p>通用的流数据实时计算系统，以实时数据产出的低延迟、高吞吐和复用性为初衷和目标，采用actor模型构建分布式流数据计算框架（底层基于akka），功能易扩展、部分容错、数据和状态可监控。 银河具有处理实时流数据（如TimeTunnel收集的实时数据）和静态数据（如本地文件、HDFS文件）的能力，能够提供灵活的实时数据输出，并提供自定义的数据输出接口以便扩展实时计算能力。 银河目前主要是为魔方提供实时的交易、浏览和搜索日志等数据的实时计算和分析。</section>
<section>
<h3 id="section-19">开放式数据体系</h3>
<p>真正基于云平台的数据体系及数据处理平台，秉承透明、标准、隐私保护的设计理念，实现了包括主题研究、挖掘算法、实时计算数据组合的开放式数据体系。</section>
<section>
<h3 id="section-20">极限存储</h3>
<p>数据仓库应用与分布式计算的经典结合，在云梯1上实现了数据高达120:1的压缩比，迄今为止已有30余种业务数据完成应用，累积节省存储达15PB，此外，在提高数据访问效率，降低计算消耗方面也有十分显著的效果。</section>
<section>
<h3 id="section-21">Dbsync</h3>
<p>用于实时同步数据库数据到HDFS的产品，通过解析各类RDBMS的log文件来提取相应的数据库动作，进而达到数据库到HADOOP的数据同步，供相关部门提取增量数据，通过dbsync，能够了解并得到所有数据的任意变化轨迹。</section>
<section>
<h3 id="section-22">DataX <img title="to-os" src="http://www.tbdata.org/wp-content/uploads/2011/09/to-os.gif" alt="" width="57" height="19" /></h3>
<div id="_mcePaste">
<ol>
<li>DataX是一个在异构的数据库/文件系统之间高速交换数据的工具。</li>
<li>采用Framework+plugin架构构建，Framework处理了缓冲，流控，并发，上下文加载等高速数据交换的大部分技术问题，插件仅需实现对数据处理系统的访问。</li>
<li>运行模式 ：stand-alone / on hadoop</li>
<li>数据传输过程在单进程内完成，全内存操作，不读写磁盘，也没有IPC。</li>
<li>开放式的框架，开发者可以在极短的时间开发一个新插件以快速支持新的数据库/文件系统。</li>
</ol>
</div>
</section>
<section>
<h3 id="section-23">SKYNET</h3>
<p>天网调度系统（SKYNET）作为淘宝数据平台的核心调度系统，承载着淘宝数据跨部门/数十条业务线/超过一万个作业的调度和运维工作，具有图形化、跨平台、自动部署、线上运维、智能容灾的特点，是淘宝数据平台的中枢系统。</section>
<section>
<h3 id="section-24">数据开发服务平台</h3>
<p>数据开发服务平台整合IDE、调度、监控、告警、元数据、成本优化、权限控制、审计、用户管理能功能。平台将复杂的技术细节屏蔽在平台内部，为使用者提供简单便捷的用户体验，使开发者能够专注于商业领域的需求，降低用户在云梯上进行数据应用开发和数据分析的门槛。</section>
<section>
<h3 id="section-25">SuperMario</h3>
<p>海量数据的实时处理能力：SuperMario，基于erlang语言和zookeeper模块开发的高性能数据流处理框架，使用订阅者模式构建流节点间的流关系，支持高性能的数据流式实时处理。</section>
<section>
<h3 id="section-26"><a href="http://openresty.org/" target="open_source">Openresty <img title="os" src="http://www.tbdata.org/wp-content/uploads/2011/09/os.gif" alt="" width="45" height="19" /></a></h3>
<p>以更低的成本支持更高的并发处理能力：Openresty，基于Nginx构建的量子web服务框架，让web server成为量子网站核心容器，通过nginx_lua_mod的扩展，能够高效、便捷的开发高性能web服务。</section>
<section>
<h3 id="section-27">LzSQL</h3>
<p>更高效、敏捷的数据开发能力：LzSQL，基于perl：：parser模块构建的量子数据库小语言，封装了数据库分库、分表，以及异构数据实时融合（数据库和第三方引擎）的功能，便于进行快速的REST数据接口开发。</section>
<p>&nbsp;</p>
<p>&nbsp;</p>
<table class="wumii-related-items" cellspacing="0" cellpadding="2" border="0" width="100%" style="clear: both;">
    
    <tr>
        <td ><b><font size="-1"  style="display: block !important; padding: 20px 0 5px !important;">您可能也喜欢：</font></b></td>
    </tr>
    
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2009%2F10%2F31%2Fdata-source-gateway-one%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F09%2F17%2Ftaobao-project%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">数据源架构模式笔记(一)</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2010%2F01%2F13%2Fdesireable-characteristics-design%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F09%2F17%2Ftaobao-project%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">软件构建中的理想设计特征</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2012%2F01%2F17%2Ftf-idf%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F09%2F17%2Ftaobao-project%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">TF-IDF及文本相似性度量</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2009%2F10%2F09%2Fpatterns-of-enterprise-application-architecture-notes-one%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F09%2F17%2Ftaobao-project%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">《企业应用架构模式》笔记一</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2009%2F11%2F16%2F%25E4%25BC%259A%25E8%25AF%259D%25E7%258A%25B6%25E6%2580%2581%25E6%25A8%25A1%25E5%25BC%258F%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F09%2F17%2Ftaobao-project%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">会话状态模式</font>
                    </a>
                </td>
            </tr>
    
    <tr>
        <td  align="right">
            <a style="text-decoration: none !important;" href="http://www.wumii.com/widget/relatedItems" target="_blank" title="无觅相关文章插件">
                <font size="-1" color="#bbbbbb" style="display: block !important; font-family: arial !important; padding: 5px 0 !important; font-size: 12px !important; color: #bbb !important;">无觅</font>
            </a>
        </td>
    </tr>
</table>]]></content:encoded>
			<wfw:commentRss>http://fendou.org/2011/09/17/taobao-project/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>MySQL写入优化</title>
		<link>http://fendou.org/2011/08/30/mysql%e5%86%99%e5%85%a5%e4%bc%98%e5%8c%96/</link>
		<comments>http://fendou.org/2011/08/30/mysql%e5%86%99%e5%85%a5%e4%bc%98%e5%8c%96/#comments</comments>
		<pubDate>Tue, 30 Aug 2011 15:03:14 +0000</pubDate>
		<dc:creator>崔玉松</dc:creator>
				<category><![CDATA[Excellence Article]]></category>
		<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://fendou.org/?p=846</guid>
		<description><![CDATA[innodb_buffer_pool_size 如果用Innodb，那么这是一个重要变量。相对于MyISAM来说，Innodb对于buffer size更敏感。MySIAM可能对于大数据量使用默认的key_buffer_size也还好，但Innodb在大数据量时用默认值就感觉在爬了。 Innodb的缓冲池会缓存数据和索引，所以不需要给系统的缓存留空间，如果只用Innodb，可以把这个值设为内存的70%-80%。和 key_buffer相同，如果数据量比较小也不怎么增加，那么不要把这个值设太高也可以提高内存的使用率。 innodb_additional_pool_size 这个的效果不是很明显，至少是当操作系统能合理分配内存时。但你可能仍需要设成20M或更多一点以看Innodb会分配多少内存做其他用途。 innodb_log_file_size 对于写很多尤其是大数据量时非常重要。要注意，大的文件提供更高的性能，但数据库恢复时会用更多的时间。我一般用64M-512M，具体取决于服务器的空间。 innodb_log_buffer_size 默认值对于多数中等写操作和事务短的运用都是可以的。如果经常做更新或者使用了很多blob数据，应该增大这个值。但太大了也是浪费内存，因为1秒钟总会 flush（这个词的中文怎么说呢？）一次，所以不需要设到超过1秒的需求。8M-16M一般应该够了。小的运用可以设更小一点。 innodb_flush_log_at_trx_commit （这个很管用） 抱怨Innodb比MyISAM慢 100倍？那么你大概是忘了调整这个值。默认值1的意思是每一次事务提交或事务外的指令都需要把日志写入（flush）硬盘，这是很费时的。特别是使用电 池供电缓存（Battery backed up cache）时。设成2对于很多运用，特别是从MyISAM表转过来的是可以的，它的意思是不写入硬盘而是写入系统缓存。日志仍然会每秒flush到硬 盘，所以你一般不会丢失超过1-2秒的更新。设成0会更快一点，但安全方面比较差，即使MySQL挂了也可能会丢失事务的数据。而值2只会在整个操作系统 挂了时才可能丢数据。 上面是网上看的，我发现慢查询日志内有很多update和insert的查询，就把innodb_flush_log_at_trx_commit改成了2，效果很明显，改成0会更明显，但安全性比较差。做下面的操作启动mysqld就生效： vim /etc/my.cnf innodb_flush_log_at_trx_commit=2 也可以在mysqld运行时执行： set GLOBAL innodb_flush_log_at_trx_commit = 2 下面是mysql手册上innodb_flush_log_at_trx_commit的解释： 如果innodb_flush_log_at_trx_commit设置为0，log buffer将每秒一次地写入log file中，并且log file的flush(刷到磁盘)操作同时进行；但是，这种模式下，在事务提交的时候，不会有任何动作。如果 innodb_flush_log_at_trx_commit设置为1(默认值)，log buffer每次事务提交都会写入log file，并且，flush刷到磁盘中去。如果innodb_flush_log_at_trx_commit设置为2，log buffer在每次事务提交的时候都会写入log file，但是，flush(刷到磁盘)操作并不会同时进行。这种模式下，MySQL会每秒一次地去做flush(刷到磁盘)操作。注意：由于进程调度策 略问题，这个“每秒一次的flush(刷到磁盘)操作”并不是保证100%的“每秒”。 默认值1是为了ACID (atomicity, consistency, isolation, durability)原子性，一致性，隔离性和持久化的考虑。如果你不把innodb_flush_log_at_trx_commit设置为1，你将获得更好的性能，但是，你在系统崩溃的情况，可能会丢失最多一秒钟的事务数据。当你把innodb_flush_log_at_trx_commit设置 为0，mysqld进程的崩溃会导致上一秒钟所有事务数据的丢失。如果你把innodb_flush_log_at_trx_commit设置为2，只有在操作系统崩溃或者系统掉电的情况下，上一秒钟所有事务数据才可能丢失。InnoDB的crash recovery崩溃恢复机制并不受这个值的影响，不管这个值设置为多少，crash recovery崩溃恢复机制都会工作。 另外innodb_flush_method参数也值得关注，对写操作有影响： innodb_flush_method： 设置InnoDB同步IO的方式： 1)...  <a href="http://fendou.org/2011/08/30/mysql%e5%86%99%e5%85%a5%e4%bc%98%e5%8c%96/" class="more-link" title="Read MySQL写入优化">Read more &#187;</a><table class="wumii-related-items" cellspacing="0" cellpadding="2" border="0" width="100%" style="clear: both;">
    
    <tr>
        <td ><b><font size="-1"  style="display: block !important; padding: 20px 0 5px !important;">您可能也喜欢：</font></b></td>
    </tr>
    
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2011%2F05%2F06%2Fmysql-communication-protocols%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F08%2F30%2Fmysql%25E5%2586%2599%25E5%2585%25A5%25E4%25BC%2598%25E5%258C%2596%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">MySQL通信协议</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2009%2F01%2F20%2Fmysql-heap%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F08%2F30%2Fmysql%25E5%2586%2599%25E5%2585%25A5%25E4%25BC%2598%25E5%258C%2596%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">mysql内存表heap使用总结</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2010%2F03%2F07%2Fmysql-view%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F08%2F30%2Fmysql%25E5%2586%2599%25E5%2585%25A5%25E4%25BC%2598%25E5%258C%2596%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">MySQL视图介绍</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2011%2F05%2F06%2Fmysql-5-1-56-innodb-plugin%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F08%2F30%2Fmysql%25E5%2586%2599%25E5%2585%25A5%25E4%25BC%2598%25E5%258C%2596%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">MySQL 5.1.56 使用 InnoDB Plugin</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2010%2F03%2F07%2Fmysql-root-lose%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F08%2F30%2Fmysql%25E5%2586%2599%25E5%2585%25A5%25E4%25BC%2598%25E5%258C%2596%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">mysql root帐号丢失解决办法</font>
                    </a>
                </td>
            </tr>
    
    <tr>
        <td  align="right">
            <a style="text-decoration: none !important;" href="http://www.wumii.com/widget/relatedItems" target="_blank" title="无觅相关文章插件">
                <font size="-1" color="#bbbbbb" style="display: block !important; font-family: arial !important; padding: 5px 0 !important; font-size: 12px !important; color: #bbb !important;">无觅</font>
            </a>
        </td>
    </tr>
</table>]]></description>
			<content:encoded><![CDATA[<p><strong>innodb_buffer_pool_size</strong><br />
如果用Innodb，那么这是一个重要变量。相对于MyISAM来说，Innodb对于buffer size更敏感。MySIAM可能对于大数据量使用默认的key_buffer_size也还好，但Innodb在大数据量时用默认值就感觉在爬了。 Innodb的缓冲池会缓存数据和索引，所以不需要给系统的缓存留空间，如果只用Innodb，可以把这个值设为内存的70%-80%。和 key_buffer相同，如果数据量比较小也不怎么增加，那么不要把这个值设太高也可以提高内存的使用率。<br />
<strong>innodb_additional_pool_size</strong><br />
这个的效果不是很明显，至少是当操作系统能合理分配内存时。但你可能仍需要设成20M或更多一点以看Innodb会分配多少内存做其他用途。<br />
<strong>innodb_log_file_size</strong><br />
对于写很多尤其是大数据量时非常重要。要注意，大的文件提供更高的性能，但数据库恢复时会用更多的时间。我一般用64M-512M，具体取决于服务器的空间。<br />
<strong>innodb_log_buffer_size</strong><br />
默认值对于多数中等写操作和事务短的运用都是可以的。如果经常做更新或者使用了很多blob数据，应该增大这个值。但太大了也是浪费内存，因为1秒钟总会 flush（这个词的中文怎么说呢？）一次，所以不需要设到超过1秒的需求。8M-16M一般应该够了。小的运用可以设更小一点。<br />
<strong>innodb_flush_log_at_trx_commit</strong> （这个很管用）<br />
抱怨Innodb比MyISAM慢 100倍？那么你大概是忘了调整这个值。默认值1的意思是每一次事务提交或事务外的指令都需要把日志写入（flush）硬盘，这是很费时的。特别是使用电 池供电缓存（Battery backed up cache）时。设成2对于很多运用，特别是从MyISAM表转过来的是可以的，它的意思是不写入硬盘而是写入系统缓存。日志仍然会每秒flush到硬 盘，所以你一般不会丢失超过1-2秒的更新。设成0会更快一点，但安全方面比较差，即使MySQL挂了也可能会丢失事务的数据。而值2只会在整个操作系统 挂了时才可能丢数据。</p>
<p>上面是网上看的，我发现慢查询日志内有很多update和insert的查询，就把innodb_flush_log_at_trx_commit改成了2，效果很明显，改成0会更明显，但安全性比较差。做下面的操作启动mysqld就生效：<br />
vim /etc/my.cnf<br />
innodb_flush_log_at_trx_commit=2</p>
<p>也可以在mysqld运行时执行：<br />
set GLOBAL innodb_flush_log_at_trx_commit = 2</p>
<p>下面是mysql手册上innodb_flush_log_at_trx_commit的解释：<br />
如果innodb_flush_log_at_trx_commit设置为0，log buffer将每秒一次地写入log file中，并且log file的flush(刷到磁盘)操作同时进行；但是，这种模式下，在事务提交的时候，不会有任何动作。如果 innodb_flush_log_at_trx_commit设置为1(默认值)，log buffer每次事务提交都会写入log file，并且，flush刷到磁盘中去。如果innodb_flush_log_at_trx_commit设置为2，log buffer在每次事务提交的时候都会写入log file，但是，flush(刷到磁盘)操作并不会同时进行。这种模式下，MySQL会每秒一次地去做flush(刷到磁盘)操作。注意：由于进程调度策 略问题，这个“每秒一次的flush(刷到磁盘)操作”并不是保证100%的“每秒”。<br />
默认值1是为了ACID (atomicity, consistency, isolation, durability)原子性，一致性，隔离性和持久化的考虑。如果你不把innodb_flush_log_at_trx_commit设置为1，你将获得更好的性能，但是，你在系统崩溃的情况，可能会丢失最多一秒钟的事务数据。当你把innodb_flush_log_at_trx_commit设置 为0，mysqld进程的崩溃会导致上一秒钟所有事务数据的丢失。如果你把innodb_flush_log_at_trx_commit设置为2，只有在操作系统崩溃或者系统掉电的情况下，上一秒钟所有事务数据才可能丢失。InnoDB的crash recovery崩溃恢复机制并不受这个值的影响，不管这个值设置为多少，crash recovery崩溃恢复机制都会工作。</p>
<p>另外<strong>innodb_flush_method</strong>参数也值得关注，对写操作有影响：<br />
innodb_flush_method： 设置InnoDB同步IO的方式：<br />
1) Default – 使用fsync（）。<br />
2) O_SYNC 以sync模式打开文件，通常比较慢。<br />
3) O_DIRECT，在Linux上使用Direct IO。可以显著提高速度，特别是在RAID系统上。避免额外的数据复制和double buffering（mysql buffering 和OS buffering）。</p>
<table class="wumii-related-items" cellspacing="0" cellpadding="2" border="0" width="100%" style="clear: both;">
    
    <tr>
        <td ><b><font size="-1"  style="display: block !important; padding: 20px 0 5px !important;">您可能也喜欢：</font></b></td>
    </tr>
    
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2011%2F05%2F06%2Fmysql-communication-protocols%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F08%2F30%2Fmysql%25E5%2586%2599%25E5%2585%25A5%25E4%25BC%2598%25E5%258C%2596%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">MySQL通信协议</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2009%2F01%2F20%2Fmysql-heap%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F08%2F30%2Fmysql%25E5%2586%2599%25E5%2585%25A5%25E4%25BC%2598%25E5%258C%2596%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">mysql内存表heap使用总结</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2010%2F03%2F07%2Fmysql-view%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F08%2F30%2Fmysql%25E5%2586%2599%25E5%2585%25A5%25E4%25BC%2598%25E5%258C%2596%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">MySQL视图介绍</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2011%2F05%2F06%2Fmysql-5-1-56-innodb-plugin%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F08%2F30%2Fmysql%25E5%2586%2599%25E5%2585%25A5%25E4%25BC%2598%25E5%258C%2596%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">MySQL 5.1.56 使用 InnoDB Plugin</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2010%2F03%2F07%2Fmysql-root-lose%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F08%2F30%2Fmysql%25E5%2586%2599%25E5%2585%25A5%25E4%25BC%2598%25E5%258C%2596%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">mysql root帐号丢失解决办法</font>
                    </a>
                </td>
            </tr>
    
    <tr>
        <td  align="right">
            <a style="text-decoration: none !important;" href="http://www.wumii.com/widget/relatedItems" target="_blank" title="无觅相关文章插件">
                <font size="-1" color="#bbbbbb" style="display: block !important; font-family: arial !important; padding: 5px 0 !important; font-size: 12px !important; color: #bbb !important;">无觅</font>
            </a>
        </td>
    </tr>
</table>]]></content:encoded>
			<wfw:commentRss>http://fendou.org/2011/08/30/mysql%e5%86%99%e5%85%a5%e4%bc%98%e5%8c%96/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>mysql分页limit 优化</title>
		<link>http://fendou.org/2011/08/30/mysql-limit-opt/</link>
		<comments>http://fendou.org/2011/08/30/mysql-limit-opt/#comments</comments>
		<pubDate>Tue, 30 Aug 2011 14:56:38 +0000</pubDate>
		<dc:creator>崔玉松</dc:creator>
				<category><![CDATA[Excellence Article]]></category>
		<category><![CDATA[MySQL]]></category>

		<guid isPermaLink="false">http://fendou.org/?p=844</guid>
		<description><![CDATA[mysql的分页比较简单，只需要limit offset,length就可以获取数据了，但是当offset和length比较大的时候，mysql明显性能下降 1.子查询优化法 先找出第一条数据，然后大于等于这条数据的id就是要获取的数据 缺点：数据必须是连续的，可以说不能有where条件，where条件会筛选数据，导致数据失去连续性 实验下 Sql代码  mysql&#62; set profiling=1; Query OK, 0 rows affected (0.00 sec) mysql&#62; select count(*) from Member; +&#8212;&#8212;&#8212;-+ &#124; count(*) &#124; +&#8212;&#8212;&#8212;-+ &#124;   169566 &#124; +&#8212;&#8212;&#8212;-+ 1 row in set (0.00 sec) mysql&#62; pager grep !~- PAGER set to &#8217;grep !~-&#8217; mysql&#62; select * from Member limit 10, 100; 100 rows in set (0.00 sec) mysql&#62; select * from Member where MemberID &#62;= (select MemberID from Member limit 10,1) limit 100; 100 rows in set (0.00 sec) mysql&#62; select * from Member limit 1000, 100; 100 rows in set (0.01 sec) mysql&#62; select * from Member where MemberID &#62;= (select MemberID from Member limit 1000,1) limit 100; 100 rows in set (0.00 sec) mysql&#62; select * from Member limit 100000, 100; 100 rows in set (0.10 sec) mysql&#62; select * from Member where MemberID &#62;= (select MemberID from Member limit 100000,1) limit 100; 100 rows in set (0.02 sec) mysql&#62; nopager PAGER set to stdout mysql&#62; show profiles\G *************************** 1. row *************************** Query_ID: 1 Duration: 0.00003300 Query: select count(*) from Member *************************** 2. row *************************** Query_ID: 2 Duration: 0.00167000 Query: select * from Member limit 10, 100 *************************** 3. row *************************** Query_ID: 3 Duration: 0.00112400 Query: select * from Member where MemberID &#62;= (select MemberID from Member limit 10,1) limit 100 *************************** 4. row *************************** Query_ID: 4 Duration: 0.00263200 Query: select * from Member limit 1000, 100 *************************** 5. row *************************** Query_ID: 5 Duration: 0.00134000 Query: select * from Member where MemberID &#62;= (select MemberID from Member limit 1000,1) limit 100 *************************** 6. row *************************** Query_ID: 6...  <a href="http://fendou.org/2011/08/30/mysql-limit-opt/" class="more-link" title="Read mysql分页limit 优化">Read more &#187;</a><table class="wumii-related-items" cellspacing="0" cellpadding="2" border="0" width="100%" style="clear: both;">
    
    <tr>
        <td ><b><font size="-1"  style="display: block !important; padding: 20px 0 5px !important;">您可能也喜欢：</font></b></td>
    </tr>
    
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2011%2F05%2F06%2Fmysql-communication-protocols%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F08%2F30%2Fmysql-limit-opt%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">MySQL通信协议</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2009%2F02%2F26%2Fmysql-trigger-one%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F08%2F30%2Fmysql-limit-opt%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">MySQL触发器介绍</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2010%2F05%2F28%2Fphp-mysql-procedure%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F08%2F30%2Fmysql-limit-opt%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">PHP中操作mysql执行存储过程</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2010%2F05%2F28%2Fmysql-variable-params-comment%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F08%2F30%2Fmysql-limit-opt%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">Mysql存储过程学习笔记--变量、参数、注释</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2009%2F07%2F05%2Fmysql-index-limit%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F08%2F30%2Fmysql-limit-opt%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">MySQL中索引限制</font>
                    </a>
                </td>
            </tr>
    
    <tr>
        <td  align="right">
            <a style="text-decoration: none !important;" href="http://www.wumii.com/widget/relatedItems" target="_blank" title="无觅相关文章插件">
                <font size="-1" color="#bbbbbb" style="display: block !important; font-family: arial !important; padding: 5px 0 !important; font-size: 12px !important; color: #bbb !important;">无觅</font>
            </a>
        </td>
    </tr>
</table>]]></description>
			<content:encoded><![CDATA[<p>mysql的分页比较简单，只需要limit offset,length就可以获取数据了，但是当offset和length比较大的时候，mysql明显性能下降</p>
<p><strong>1.子查询优化法</strong><br />
先找出第一条数据，然后大于等于这条数据的id就是要获取的数据<br />
缺点：数据必须是连续的，可以说不能有where条件，where条件会筛选数据，导致数据失去连续性</p>
<p>实验下</p>
<div>
<div>
<div>Sql代码 <embed type="application/x-shockwave-flash" width="14" height="15" src="http://willko.iteye.com/javascripts/syntaxhighlighter/clipboard_new.swf" flashvars="clipboard=mysql%3E%20set%20profiling%3D1%3B%0AQuery%20OK%2C%200%20rows%20affected%20(0.00%20sec)%0A%0Amysql%3E%20select%20count(*)%20from%20Member%3B%0A%2B----------%2B%0A%7C%20count(*)%20%7C%0A%2B----------%2B%0A%7C%20%20%20169566%20%7C%20%0A%2B----------%2B%0A1%20row%20in%20set%20(0.00%20sec)%0A%0Amysql%3E%20pager%20grep%20!~-%0APAGER%20set%20to%20'grep%20!~-'%0A%0Amysql%3E%20select%20*%20from%20Member%20limit%2010%2C%20100%3B%0A100%20rows%20in%20set%20(0.00%20sec)%0A%0Amysql%3E%20select%20*%20from%20Member%20where%20MemberID%20%3E%3D%20(select%20MemberID%20from%20Member%20limit%2010%2C1)%20limit%20100%3B%0A100%20rows%20in%20set%20(0.00%20sec)%0A%0Amysql%3E%20select%20*%20from%20Member%20limit%201000%2C%20100%3B%0A100%20rows%20in%20set%20(0.01%20sec)%0A%0Amysql%3E%20select%20*%20from%20Member%20where%20MemberID%20%3E%3D%20(select%20MemberID%20from%20Member%20limit%201000%2C1)%20limit%20100%3B%0A100%20rows%20in%20set%20(0.00%20sec)%0A%0Amysql%3E%20select%20*%20from%20Member%20limit%20100000%2C%20100%3B%0A100%20rows%20in%20set%20(0.10%20sec)%0A%0Amysql%3E%20select%20*%20from%20Member%20where%20MemberID%20%3E%3D%20(select%20MemberID%20from%20Member%20limit%20100000%2C1)%20limit%20100%3B%0A100%20rows%20in%20set%20(0.02%20sec)%0A%0Amysql%3E%20nopager%0APAGER%20set%20to%20stdout%0A%0A%0Amysql%3E%20show%20profiles%5CG%0A***************************%201.%20row%20***************************%0AQuery_ID%3A%201%0ADuration%3A%200.00003300%0A%20%20%20Query%3A%20select%20count(*)%20from%20Member%0A%0A***************************%202.%20row%20***************************%0AQuery_ID%3A%202%0ADuration%3A%200.00167000%0A%20%20%20Query%3A%20select%20*%20from%20Member%20limit%2010%2C%20100%0A***************************%203.%20row%20***************************%0AQuery_ID%3A%203%0ADuration%3A%200.00112400%0A%20%20%20Query%3A%20select%20*%20from%20Member%20where%20MemberID%20%3E%3D%20(select%20MemberID%20from%20Member%20limit%2010%2C1)%20limit%20100%0A%0A***************************%204.%20row%20***************************%0AQuery_ID%3A%204%0ADuration%3A%200.00263200%0A%20%20%20Query%3A%20select%20*%20from%20Member%20limit%201000%2C%20100%0A***************************%205.%20row%20***************************%0AQuery_ID%3A%205%0ADuration%3A%200.00134000%0A%20%20%20Query%3A%20select%20*%20from%20Member%20where%20MemberID%20%3E%3D%20(select%20MemberID%20from%20Member%20limit%201000%2C1)%20limit%20100%0A%0A***************************%206.%20row%20***************************%0AQuery_ID%3A%206%0ADuration%3A%200.09956700%0A%20%20%20Query%3A%20select%20*%20from%20Member%20limit%20100000%2C%20100%0A***************************%207.%20row%20***************************%0AQuery_ID%3A%207%0ADuration%3A%200.02447700%0A%20%20%20Query%3A%20select%20*%20from%20Member%20where%20MemberID%20%3E%3D%20(select%20MemberID%20from%20Member%20limit%20100000%2C1)%20limit%20100" quality="high" allowscriptaccess="always" pluginspage="http://www.macromedia.com/go/getflashplayer"></embed> <a title="收藏这段代码" href="http://willko.iteye.com/blog/325618"><img src="http://willko.iteye.com/images/icon_star.png" alt="收藏代码" /></a></div>
</div>
<ol>
<li>mysql&gt; set profiling=1;</li>
<li>Query OK, 0 rows affected (0.00 sec)</li>
<li></li>
<li>mysql&gt; select count(*) from Member;</li>
<li>+&#8212;&#8212;&#8212;-+</li>
<li>| count(*) |</li>
<li>+&#8212;&#8212;&#8212;-+</li>
<li>|   169566 |</li>
<li>+&#8212;&#8212;&#8212;-+</li>
<li>1 row in set (0.00 sec)</li>
<li></li>
<li>mysql&gt; pager grep !~-</li>
<li>PAGER set to &#8217;grep !~-&#8217;</li>
<li></li>
<li>mysql&gt; select * from Member limit 10, 100;</li>
<li>100 rows in set (0.00 sec)</li>
<li></li>
<li>mysql&gt; select * from Member where MemberID &gt;= (select MemberID from Member limit 10,1) limit 100;</li>
<li>100 rows in set (0.00 sec)</li>
<li></li>
<li>mysql&gt; select * from Member limit 1000, 100;</li>
<li>100 rows in set (0.01 sec)</li>
<li></li>
<li>mysql&gt; select * from Member where MemberID &gt;= (select MemberID from Member limit 1000,1) limit 100;</li>
<li>100 rows in set (0.00 sec)</li>
<li></li>
<li>mysql&gt; select * from Member limit 100000, 100;</li>
<li>100 rows in set (0.10 sec)</li>
<li></li>
<li>mysql&gt; select * from Member where MemberID &gt;= (select MemberID from Member limit 100000,1) limit 100;</li>
<li>100 rows in set (0.02 sec)</li>
<li></li>
<li>mysql&gt; nopager</li>
<li>PAGER set to stdout</li>
<li></li>
<li></li>
<li>mysql&gt; show profiles\G</li>
<li>*************************** 1. row ***************************</li>
<li>Query_ID: 1</li>
<li>Duration: 0.00003300</li>
<li> Query: select count(*) from Member</li>
<li></li>
<li>*************************** 2. row ***************************</li>
<li>Query_ID: 2</li>
<li>Duration: 0.00167000</li>
<li> Query: select * from Member limit 10, 100</li>
<li>*************************** 3. row ***************************</li>
<li>Query_ID: 3</li>
<li>Duration: 0.00112400</li>
<li> Query: select * from Member where MemberID &gt;= (select MemberID from Member limit 10,1) limit 100</li>
<li></li>
<li>*************************** 4. row ***************************</li>
<li>Query_ID: 4</li>
<li>Duration: 0.00263200</li>
<li> Query: select * from Member limit 1000, 100</li>
<li>*************************** 5. row ***************************</li>
<li>Query_ID: 5</li>
<li>Duration: 0.00134000</li>
<li> Query: select * from Member where MemberID &gt;= (select MemberID from Member limit 1000,1) limit 100</li>
<li></li>
<li>*************************** 6. row ***************************</li>
<li>Query_ID: 6</li>
<li>Duration: 0.09956700</li>
<li> Query: select * from Member limit 100000, 100</li>
<li>*************************** 7. row ***************************</li>
<li>Query_ID: 7</li>
<li>Duration: 0.02447700</li>
<li> Query: select * from Member where MemberID &gt;= (select MemberID from Member limit 100000,1) limit 100</li>
</ol>
</div>
<p>从结果中可以得知，当偏移1000以上使用子查询法可以有效的提高性能。</p>
<p><strong>2.倒排表优化法</strong><br />
倒排表法类似建立索引，用一张表来维护页数，然后通过高效的连接得到数据</p>
<p>缺点：只适合数据数固定的情况，数据不能删除，维护页表困难</p>
<p>具体请看，<a href="http://blog.chinaunix.net/u/29134/showart_1333566.html" target="_blank">http://blog.chinaunix.net/u/29134/showart_1333566.html</a></p>
<p><strong>3.反向查找优化法</strong><br />
当偏移超过一半记录数的时候，先用排序，这样偏移就反转了</p>
<p>缺点：order by优化比较麻烦，要增加索引，索引影响数据的修改效率，并且要知道总记录数<br />
，偏移大于数据的一半</p>
<div>引用</div>
<div>limit偏移算法：<br />
正向查找： (当前页 &#8211; 1) * 页长度<br />
反向查找： 总记录 &#8211; 当前页 * 页长度</div>
<p>做下实验，看看性能如何</p>
<p>总记录数：1,628,775<br />
每页记录数： 40<br />
总页数：1,628,775 / 40 = 40720<br />
中间页数：40720 / 2 = 20360</p>
<p>第21000页<br />
正向查找SQL:</p>
<div>
<div>
<div>Sql代码 <embed type="application/x-shockwave-flash" width="14" height="15" src="http://willko.iteye.com/javascripts/syntaxhighlighter/clipboard_new.swf" flashvars="clipboard=SELECT%20*%20FROM%20%60abc%60%20WHERE%20%60BatchID%60%20%3D%20123%20LIMIT%20839960%2C%2040" quality="high" allowscriptaccess="always" pluginspage="http://www.macromedia.com/go/getflashplayer"></embed> <a title="收藏这段代码" href="http://willko.iteye.com/blog/325618"><img src="http://willko.iteye.com/images/icon_star.png" alt="收藏代码" /></a></div>
</div>
<ol>
<li>SELECT * FROM `abc` WHERE `BatchID` = 123 LIMIT 839960, 40</li>
</ol>
</div>
<p>时间：1.8696 秒</p>
<p>反向查找sql:</p>
<div>
<div>
<div>Sql代码 <embed type="application/x-shockwave-flash" width="14" height="15" src="http://willko.iteye.com/javascripts/syntaxhighlighter/clipboard_new.swf" flashvars="clipboard=SELECT%20*%20FROM%20%60abc%60%20WHERE%20%60BatchID%60%20%3D%20123%20ORDER%20BY%20InputDate%20DESC%20LIMIT%20788775%2C%2040" quality="high" allowscriptaccess="always" pluginspage="http://www.macromedia.com/go/getflashplayer"></embed> <a title="收藏这段代码" href="http://willko.iteye.com/blog/325618"><img src="http://willko.iteye.com/images/icon_star.png" alt="收藏代码" /></a></div>
</div>
<ol>
<li>SELECT * FROM `abc` WHERE `BatchID` = 123 ORDER BY InputDate DESC LIMIT 788775, 40</li>
</ol>
</div>
<p>时间：1.8336 秒</p>
<p>第30000页<br />
正向查找SQL:</p>
<div>
<div>
<div>Sql代码 <embed type="application/x-shockwave-flash" width="14" height="15" src="http://willko.iteye.com/javascripts/syntaxhighlighter/clipboard_new.swf" flashvars="clipboard=SELECT%20*%20FROM%20%60abc%60%20WHERE%20%60BatchID%60%20%3D%20123%20LIMIT%201199960%2C%2040" quality="high" allowscriptaccess="always" pluginspage="http://www.macromedia.com/go/getflashplayer"></embed> <a title="收藏这段代码" href="http://willko.iteye.com/blog/325618"><img src="http://willko.iteye.com/images/icon_star.png" alt="收藏代码" /></a></div>
</div>
<ol>
<li>SELECT * FROM `abc` WHERE `BatchID` = 123 LIMIT 1199960, 40</li>
</ol>
</div>
<p>时间：2.6493 秒</p>
<p>反向查找sql:</p>
<div>
<div>
<div>Sql代码 <embed type="application/x-shockwave-flash" width="14" height="15" src="http://willko.iteye.com/javascripts/syntaxhighlighter/clipboard_new.swf" flashvars="clipboard=SELECT%20*%20FROM%20%60abc%60%20WHERE%20%60BatchID%60%20%3D%20123%20ORDER%20BY%20InputDate%20DESC%20LIMIT%20428775%2C%2040" quality="high" allowscriptaccess="always" pluginspage="http://www.macromedia.com/go/getflashplayer"></embed> <a title="收藏这段代码" href="http://willko.iteye.com/blog/325618"><img src="http://willko.iteye.com/images/icon_star.png" alt="收藏代码" /></a></div>
</div>
<ol>
<li>SELECT * FROM `abc` WHERE `BatchID` = 123 ORDER BY InputDate DESC LIMIT 428775, 40</li>
</ol>
</div>
<p>时间：1.0035 秒</p>
<p>注意，反向查找的结果是是降序desc的，并且InputDate是记录的插入时间，也可以用主键联合索引，但是不方便。</p>
<p><strong>4.limit限制优化法</strong><br />
把limit偏移量限制低于某个数。。超过这个数等于没数据，我记得alibaba的dba说过他们是这样做的</p>
<p><strong>5.只查索引法</strong><br />
<a href="http://willko.iteye.com/blog/670120" target="_blank">http://willko.iteye.com/blog/670120</a></p>
<p>总结：limit的优化限制都比较多，所以实际情况用或者不用只能具体情况具体分析了。页数那么后，基本很少人看的。。。</p>
<p>&nbsp;</p>
<table class="wumii-related-items" cellspacing="0" cellpadding="2" border="0" width="100%" style="clear: both;">
    
    <tr>
        <td ><b><font size="-1"  style="display: block !important; padding: 20px 0 5px !important;">您可能也喜欢：</font></b></td>
    </tr>
    
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2011%2F05%2F06%2Fmysql-communication-protocols%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F08%2F30%2Fmysql-limit-opt%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">MySQL通信协议</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2009%2F02%2F26%2Fmysql-trigger-one%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F08%2F30%2Fmysql-limit-opt%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">MySQL触发器介绍</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2010%2F05%2F28%2Fphp-mysql-procedure%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F08%2F30%2Fmysql-limit-opt%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">PHP中操作mysql执行存储过程</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2010%2F05%2F28%2Fmysql-variable-params-comment%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F08%2F30%2Fmysql-limit-opt%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">Mysql存储过程学习笔记--变量、参数、注释</font>
                    </a>
                </td>
            </tr>
            <tr>
                <td style="margin: 0 !important; padding: 0 !important; line-height: 20px !important;">
                    <img border="0" src="http://static.wumii.com/images/widget/widget_solidPoint.gif">
                    <a target="_blank" style="text-decoration: none !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2009%2F07%2F05%2Fmysql-index-limit%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F08%2F30%2Fmysql-limit-opt%2F">
                        <font size="-1" color="#333333" style="line-height: 1.65em; font-size: 12px !important;">MySQL中索引限制</font>
                    </a>
                </td>
            </tr>
    
    <tr>
        <td  align="right">
            <a style="text-decoration: none !important;" href="http://www.wumii.com/widget/relatedItems" target="_blank" title="无觅相关文章插件">
                <font size="-1" color="#bbbbbb" style="display: block !important; font-family: arial !important; padding: 5px 0 !important; font-size: 12px !important; color: #bbb !important;">无觅</font>
            </a>
        </td>
    </tr>
</table>]]></content:encoded>
			<wfw:commentRss>http://fendou.org/2011/08/30/mysql-limit-opt/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>因域名解析问题不能访问三天</title>
		<link>http://fendou.org/2011/08/29/down-3-days/</link>
		<comments>http://fendou.org/2011/08/29/down-3-days/#comments</comments>
		<pubDate>Mon, 29 Aug 2011 15:46:33 +0000</pubDate>
		<dc:creator>崔玉松</dc:creator>
				<category><![CDATA[Life Diary]]></category>
		<category><![CDATA[乱讲]]></category>

		<guid isPermaLink="false">http://fendou.org/?p=840</guid>
		<description><![CDATA[如果你能看到本文，证明域名新的解析成功了，天朝真的什么都很有意思。 本博客的域名fendou.org 之前在国内注册的，后来转移到godaddy去了，今年早些时候传言要白名单了，随后真的不能访问，于是乎又将DNS服务器换到国内的DNSPOD，还好DNSPOD解析不需要备案，但是懒得改IP，直接cname到一个还在godaddy上解析的域名，结果这个域名现在也被白名单了，这几天各种纠结的事情缠身，也没怎么看博客，结果就杯具了好几天，现在改成直接使用A记录指向国内的主机，先凑合着用，等周末的时候弄到GAE（Google App Engine）上去吧，由于众所周知的原因，GAE这么好的平台，居然和谐了，还得再弄个境外的服务器做反向代理。另外一个纠结的问题是，本博客主域直接使用的裸域，没有www，GAE不能绑定裸域又是杯具……搬到GAE上最多就是国内不能正常访问而已，好歹数据还在，只要Google 没倒闭，一切都是安全的，Google倒闭的那一天，或许和谐从此消失，换成了民主<table class="wumii-related-items" cellspacing="0" cellpadding="3" border="0"  style="clear: both;">
    
    <tr>
        <td colspan="5"><b><font size="-1"  style="display: block !important; padding: 20px 0 5px !important;">您可能也喜欢：</font></b></td>
    </tr>
    
        <tr>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important;">
                    <a target="_blank" title="博客大巴（Blogbus）新版正式上线，推出友邻和活动功能" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2008%2F10%2F15%2Fblogbus%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F08%2F29%2Fdown-3-days%2F">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2012/01/20/14185088.jpg" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">博客大巴（Blogbus）新版正式上线，推出友邻和活动功能</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="联想IdeaPad发布" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2008%2F01%2F03%2F%25E8%2581%2594%25E6%2583%25B3ideapad%25E5%258F%2591%25E5%25B8%2583%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F08%2F29%2Fdown-3-days%2F">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2012/01/20/14185153.jpg" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">联想IdeaPad发布</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="ARJ21首飞,真的是中国自主知识产权？" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2008%2F11%2F29%2Farj21%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F08%2F29%2Fdown-3-days%2F">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2012/01/20/14191152.jpg" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">ARJ21首飞,真的是中国自主知识产权？</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="David Archuleta ---Crush" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2008%2F09%2F27%2Fdavid%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F08%2F29%2Fdown-3-days%2F">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2012/01/20/14185094.jpg" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">David Archuleta ---Crush</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="一组很有意思的照片" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2008%2F09%2F23%2Fstudy%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F08%2F29%2Fdown-3-days%2F">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2012/01/20/14185096.jpg" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">一组很有意思的照片</font>
                    </a>
                </td>
        </tr>
    
    <tr>
        <td colspan="5" align="right">
            <a style="text-decoration: none !important;" href="http://www.wumii.com/widget/relatedItems" target="_blank" title="无觅相关文章插件">
                <font size="-1" color="#bbbbbb" style="display: block !important; font-family: arial !important; padding: 5px 0 !important; font-size: 12px !important; color: #bbb !important;">无觅</font>
            </a>
        </td>
    </tr>
</table>]]></description>
			<content:encoded><![CDATA[<p>如果你能看到本文，证明域名新的解析成功了，天朝真的什么都很有意思。</p>
<p>本博客的域名fendou.org 之前在国内注册的，后来转移到godaddy去了，今年早些时候传言要白名单了，随后真的不能访问，于是乎又将DNS服务器换到国内的DNSPOD，还好DNSPOD解析不需要备案，但是懒得改IP，直接cname到一个还在godaddy上解析的域名，结果这个域名现在也被白名单了，这几天各种纠结的事情缠身，也没怎么看博客，结果就杯具了好几天，现在改成直接使用A记录指向国内的主机，先凑合着用，等周末的时候弄到GAE（Google App Engine）上去吧，由于众所周知的原因，GAE这么好的平台，居然和谐了，还得再弄个境外的服务器做反向代理。另外一个纠结的问题是，本博客主域直接使用的裸域，没有www，GAE不能绑定裸域又是杯具……搬到GAE上最多就是国内不能正常访问而已，好歹数据还在，只要Google 没倒闭，一切都是安全的，Google倒闭的那一天，或许和谐从此消失，换成了民主</p>
<table class="wumii-related-items" cellspacing="0" cellpadding="3" border="0"  style="clear: both;">
    
    <tr>
        <td colspan="5"><b><font size="-1"  style="display: block !important; padding: 20px 0 5px !important;">您可能也喜欢：</font></b></td>
    </tr>
    
        <tr>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important;">
                    <a target="_blank" title="博客大巴（Blogbus）新版正式上线，推出友邻和活动功能" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2008%2F10%2F15%2Fblogbus%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F08%2F29%2Fdown-3-days%2F">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2012/01/20/14185088.jpg" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">博客大巴（Blogbus）新版正式上线，推出友邻和活动功能</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="联想IdeaPad发布" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2008%2F01%2F03%2F%25E8%2581%2594%25E6%2583%25B3ideapad%25E5%258F%2591%25E5%25B8%2583%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F08%2F29%2Fdown-3-days%2F">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2012/01/20/14185153.jpg" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">联想IdeaPad发布</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="ARJ21首飞,真的是中国自主知识产权？" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2008%2F11%2F29%2Farj21%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F08%2F29%2Fdown-3-days%2F">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2012/01/20/14191152.jpg" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">ARJ21首飞,真的是中国自主知识产权？</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="David Archuleta ---Crush" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2008%2F09%2F27%2Fdavid%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F08%2F29%2Fdown-3-days%2F">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2012/01/20/14185094.jpg" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">David Archuleta ---Crush</font>
                    </a>
                </td>
                <td width="102" valign="top" style="padding: 5px !important; margin: 0 !important; border-left: 1px solid #DDDDDD !important;">
                    <a target="_blank" title="一组很有意思的照片" style="text-decoration: none !important; cursor: pointer !important;" href="http://app.wumii.com/ext/redirect?url=http%3A%2F%2Ffendou.org%2F2008%2F09%2F23%2Fstudy%2F&from=http%3A%2F%2Ffendou.org%2F2011%2F08%2F29%2Fdown-3-days%2F">
                        <img style="margin: 0 !important; padding: 2px !important; border: 1px solid #DDDDDD !important; width: 96px !important; height: 96px !important;" src="http://static.wumii.com/site_images/2012/01/20/14185096.jpg" width="96px" height="96px" /><br />
                        <font size="-1" color="#333333" style="display: block !important; line-height: 15px !important; width: 102px !important; font: 12px/15px arial !important; height: 60px !important; margin: 3px 0 0 0 !important; padding: 0 !important; overflow: hidden !important;">一组很有意思的照片</font>
                    </a>
                </td>
        </tr>
    
    <tr>
        <td colspan="5" align="right">
            <a style="text-decoration: none !important;" href="http://www.wumii.com/widget/relatedItems" target="_blank" title="无觅相关文章插件">
                <font size="-1" color="#bbbbbb" style="display: block !important; font-family: arial !important; padding: 5px 0 !important; font-size: 12px !important; color: #bbb !important;">无觅</font>
            </a>
        </td>
    </tr>
</table>]]></content:encoded>
			<wfw:commentRss>http://fendou.org/2011/08/29/down-3-days/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

