<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Cartesian Faith</title>
	<atom:link href="http://cartesianfaith.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://cartesianfaith.wordpress.com</link>
	<description>Insights of a modern alchemist</description>
	<lastBuildDate>Sat, 18 May 2013 14:13:15 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='cartesianfaith.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://1.gravatar.com/blavatar/f0e8418b903519c8b4ad1d02c3a01b0f?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>Cartesian Faith</title>
		<link>http://cartesianfaith.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://cartesianfaith.wordpress.com/osd.xml" title="Cartesian Faith" />
	<atom:link rel='hub' href='http://cartesianfaith.wordpress.com/?pushpress=hub'/>
		<item>
		<title>Better logging in R (aka futile.logger 1.3.0 released)</title>
		<link>http://cartesianfaith.wordpress.com/2013/03/10/better-logging-in-r-aka-futile-logger-1-3-0-released/</link>
		<comments>http://cartesianfaith.wordpress.com/2013/03/10/better-logging-in-r-aka-futile-logger-1-3-0-released/#comments</comments>
		<pubDate>Sun, 10 Mar 2013 20:27:41 +0000</pubDate>
		<dc:creator>Brian Lee Yung Rowe</dc:creator>
				<category><![CDATA[error messages]]></category>
		<category><![CDATA[futile]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://cartesianfaith.wordpress.com/?p=689</guid>
		<description><![CDATA[In many languages logging is now part of the batteries included with a language. This isn&#8217;t yet the case in &#8230;<p><a href="http://cartesianfaith.wordpress.com/2013/03/10/better-logging-in-r-aka-futile-logger-1-3-0-released/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=cartesianfaith.wordpress.com&#038;blog=6379897&#038;post=689&#038;subd=cartesianfaith&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>In many languages logging is now part of the batteries included with a language. This isn&#8217;t yet the case in R so most people make do with <span style="text-decoration:underline;">cat</span> commands laced through their code. Then after development, when the system is being run for real, a lot of those <span style="text-decoration:underline;">cat</span> statements are commented out so it doesn&#8217;t fill up log files and hide real error messages. When there is a bug, the <span style="text-decoration:underline;">cat</span> commands are pulled down from the attic and so on. It would be far easier to have a simple logging utility that managed the various debug statements in your code such that they could all be turned on and off as a set or by severity.</p>
<p>This dream is no longer futile. The latest release of my package <a href="http://cran.r-project.org/web/packages/futile.logger/index.html">futile.logger</a> has been rewritten with usability in mind (and is now built on lambda.r). It itself comes with batteries included and requires zero configuration to work.</p>
<h3>Basics</h3>
<p>You can use futile.logger right out of the box as a replacement for <span style="text-decoration:underline;">cat</span>. Here are some examples. Notice that futile.logger decorates log statements with generally useful information, like the logging level and a timestamp.</p>
<pre class="brush: r; title: ; notranslate">
library(futile.logger)
flog.info(&quot;My first log statement with futile.logger&quot;)
flog.warn(&quot;This statement has higher severity&quot;)
flog.fatal(&quot;This one is really scary&quot;)
</pre>
<p>Format strings are built into the logging system, so log statements can be dynamic based on the data.</p>
<pre class="brush: r; title: ; notranslate">
flog.info(&quot;This is my %s log statement, %s&quot;, 'second', 'ever')
x &lt;- rnorm(400)
flog.info(&quot;The variance of the sample is %s&quot;, var(x))
</pre>
<p>Use the flog.threshold function to choose which messages should be displayed. If you no longer want to see INFO-level messages, then change the threshold to WARN.</p>
<pre class="brush: r; title: ; notranslate">
flog.threshold(WARN)
flog.info(&quot;Log statement %s is hidden!&quot;, 3)
flog.warn(&quot;However warning messages will still appear&quot;)
</pre>
<h3>Concepts</h3>
<p>To take full advantage of futile.logger, it is important to understand the handful of concepts that the package introduces. For anyone familiar with log4j or its myriad clones this should be familiar.</p>
<h4>LOGGERS</h4>
<p>A <em>logger</em> is simply a namespace bound to a <em>threshold</em>, an <em>appender</em>, and a <em>layout</em>. When futile.logger is loaded, a single logger is created, which is known as the <span style="text-decoration:underline;">ROOT</span> logger. All other loggers inherit from the <span style="text-decoration:underline;">ROOT</span> logger in terms of namespace and also configuration. The <span style="text-decoration:underline;">ROOT</span> logger has a threshold of INFO, logs to standard out, and has a simple layout for the messages.</p>
<p>Loggers are configured automatically whenever they are referenced (for example when changing the threshold, or logging to a logger) inheriting the settings of the <span style="text-decoration:underline;">ROOT</span> logger. Suppose you want a separate namespace for data i/o. You can make log messages that reference a logger <span style="text-decoration:underline;">data.io</span> simply by specifying a different logger name.</p>
<pre>&gt; flog.threshold(INFO)
NULL
&gt; flog.info("Loading data.frame", name='data.io')</pre>
<p>Even though no logger was explicitly defined, futile.logger knows how to print messages since the logger <span style="text-decoration:underline;">data.io</span> inherits from <span style="text-decoration:underline;">ROOT</span>. Having a new logger is nice, but it&#8217;s not particularly useful if it behaves identically to the default logger. What if we want to see TRACE level messages for our data i/o but not clutter up the rest of our logging?</p>
<pre>&gt; flog.threshold(TRACE, name='data.io')
NULL
&gt; 
&gt; flog.trace("Connecting to data store")</pre>
<p>Whoops, this is gong to the ROOT logger, so obviously it won&#8217;t print (since it&#8217;s still at INFO). Let&#8217;s route the message to the <span style="text-decoration:underline;">data.io</span> logger we referenced earlier.</p>
<p>&gt; flog.trace(&#8220;Connecting to data store&#8221;, name=&#8217;data.io&#8217;)<br />
TRACE [2013-03-10 16:03:40] Connecting to data store</p>
<p>That looks better. This is how the hierarchy works since futile.logger will use values specified higher in the hierarchy if they don&#8217;t exist for the current logger. We can even change the appender for <span style="text-decoration:underline;">data.io</span> to write to a file instead.</p>
<pre>&gt; flog.appender(appender.file("data.io.log"), name="data.io")
NULL
&gt; flog.trace("Connecting to data store", name='data.io')</pre>
<p>What about intermediate points in the hierarchy? Suppose we have other data operations logging to the logger <span style="text-decoration:underline;">data</span>. We want these to be at TRACE, but we want them to continue writing to the console.</p>
<pre>&gt; flog.threshold(TRACE, name='data')
NULL
&gt; flog.trace("Merging data.frames", name='data')
TRACE [2013-03-10 16:10:14] Merging data.frames
&gt; flog.trace("Connecting to data store", name='data.io')</pre>
<p>So now the logger data will print all log messages to console, while data.io will write to a file. This should give you a good idea of how to manipulate loggers in futile.logger.</p>
<h4>Logging Levels and Thresholds</h4>
<p>It&#8217;s probably clear that there are multiple <em>log levels</em> available. Each log message has an associated log level that ranges from TRACE to FATAL. The levels form a hierarchy that not only indicate the severity of the message but also when it should be displayed. This is controlled by the <em>threshold</em>, which only allows messages with a severity greater than or equal to the threshold to be displayed. When futile.logger is loaded, the default log level is INFO. To change the logging threshold use the futile.threshold function. The function has two arguments: the new threshold and the name of the logger. The available thresholds are the same as the log levels: TRACE, DEBUG, INFO, WARN, ERROR, FATAL. Choose the threshold that you want to see and all messages of that severity and higher will display.</p>
<p>This means that the default INFO threshold will display all log messages with a log level of INFO, WARN, ERROR, FATAL. If you only want to see errors, then set the threshold to ERROR.</p>
<pre class="brush: r; title: ; notranslate">
flog.threshold(ERROR)
flog.info(&quot;Will print: %s&quot;, FALSE)
flog.warn(&quot;Will print: %s&quot;, FALSE)
flog.error(&quot;Will print: %s&quot;, TRUE)
flog.fatal(&quot;Will print: %s&quot;, TRUE)
</pre>
<p>With this simple mechanism you can add log statements in your code and control the display of the messages by simply changing the logging threshold. Hence for development keep the threshold on INFO, for debugging change to TRACE, and for production set to WARN.</p>
<p>The name argument determines the logger itself, as discussed above.</p>
<h4>Appenders</h4>
<p>An <em>appender</em> defines where a log message goes. The default is to standard out, but it is also possible to write log messages to files, URLs, message queues, etc. Writing your own appender is as simple as creating a function that has the following signature.</p>
<pre class="brush: r; title: ; notranslate">
appender.fn &lt;- function(line) { }
</pre>
<p>Assigning it to a given logger follows the same convention as for layouts.</p>
<pre class="brush: r; title: ; notranslate">
flog.appender(appender.fn)
flog.appender(appender.fn, name='test.logger')
</pre>
<p>Similarly calling the function without an explicit logger will bind this appender to the <span style="text-decoration:underline;">ROOT</span> logger.</p>
<p>The following two appenders are provided out of the box in futile.logger. Each function returns a function, so when binding an appender to a logger, be sure to execute the top level function.</p>
<ul>
<li><span style="font-style:inherit;line-height:1.625;">appender.console()</span></li>
<li><span style="font-style:inherit;line-height:1.625;">appender.file(filename)</span></li>
</ul>
<h4>LAYOUTS</h4>
<p>Suppose you want to display log messages in a different format? To do this you would create a custom <em>layout</em> function, which is responsible for returning a string that represents the log message. The function interface is</p>
<pre class="brush: r; title: ; notranslate">
layout.fn &lt;- function(level, message, ...) { }
</pre>
<p>where the ellipsis argument represents represent any arguments passed via the logging functions described above. You set the layout by assigning the layout function to the given logger.</p>
<pre class="brush: r; title: ; notranslate">
flog.layout(layout.fn)
flog.layout(layout.fn, name='test.logger')
</pre>
<p>If you don&#8217;t provide a logger, futile.logger will use the <span style="text-decoration:underline;">ROOT</span> logger. This means all loggers will be impacted by the change in layout (except loggers that explicitly defined a different layout).</p>
<h3>Package Support</h3>
<p>If you happen to use a package that uses futile.logger, you can control logging of messages interactively on a per package basis. That means you, as the package user, decides how much logging information you wish to see for a given package.</p>
<pre class="brush: r; title: ; notranslate">
&gt; library(tawny)
&gt; library(fractalrock)
&gt; prices &lt;- getPortfolioPrices(LETTERS[1:10], 100)
&gt; returns &lt;- apply(prices,2,Delt)[-1,]
&gt; s &lt;- cov.shrink(as.xts(returns,order.by=index(prices)[-1]))
&lt;em id=&quot;__mceDel&quot;&gt;INFO [2013-03-08 09:52:07] Got intensity k = 140.006097751888 and coefficient d = 1&lt;/em&gt;
&gt; flog.threshold(WARN, name='tawny')
NULL
&gt; s &lt;- cov.shrink(as.xts(returns,order.by=dates[-1]))
&gt; flog.info(&quot;Got covariance matrix&quot;)
INFO [2013-03-08 09:55:25] Got covariance matrix
</pre>
<p>Hence by using futile.logger, package authors can balance their debugging needs with a user&#8217;s usage needs. It also means that for debugging issues, a user can change the log threshold for the package and give a more detailed output to the package maintainer.</p>
<h3>Conclusion</h3>
<p>Futile.logger is now more robust and easier to use. Install from CRAN or read more documentation and the source on the <a href="https://github.com/muxspace/futile.logger">github</a> page.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cartesianfaith.wordpress.com/689/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cartesianfaith.wordpress.com/689/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=cartesianfaith.wordpress.com&#038;blog=6379897&#038;post=689&#038;subd=cartesianfaith&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://cartesianfaith.wordpress.com/2013/03/10/better-logging-in-r-aka-futile-logger-1-3-0-released/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/6b4bc36b8607b256e0ec8e320291802a?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">muxspace</media:title>
		</media:content>
	</item>
		<item>
		<title>Lambda.r 1.1.1 released (and introducing the EMPTY keyword)</title>
		<link>http://cartesianfaith.wordpress.com/2013/03/06/lambda-r-1-1-1-released-and-introducing-the-empty-keyword/</link>
		<comments>http://cartesianfaith.wordpress.com/2013/03/06/lambda-r-1-1-1-released-and-introducing-the-empty-keyword/#comments</comments>
		<pubDate>Wed, 06 Mar 2013 18:44:25 +0000</pubDate>
		<dc:creator>Brian Lee Yung Rowe</dc:creator>
				<category><![CDATA[lambda.r]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://cartesianfaith.wordpress.com/?p=690</guid>
		<description><![CDATA[I&#8217;m pleased to announce that lambda.r 1.1.1 is now available on CRAN. This release is mostly a bug fix release, &#8230;<p><a href="http://cartesianfaith.wordpress.com/2013/03/06/lambda-r-1-1-1-released-and-introducing-the-empty-keyword/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=cartesianfaith.wordpress.com&#038;blog=6379897&#038;post=690&#038;subd=cartesianfaith&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>I&#8217;m pleased to announce that lambda.r 1.1.1 is now available on CRAN. This release is mostly a bug fix release, although a few important enhancements were included. <span style="font-style:inherit;line-height:1.625;"> </span></p>
<ul>
<li>[bug] Support Function in every type position (only supported for return type)</li>
<li>[bug] Auto-replacing a function with 0 arguments fails</li>
<li>[bug] Fix type inheritance</li>
<li>[new] Functions that return non-scalar values work as default values on arguments</li>
<li>[new] Support pattern matching on NULL and NA</li>
<li>[new] Support pattern matching on special symbol EMPTY</li>
</ul>
<h3>Pattern Matching for NA and NULL</h3>
<p>Most significantly are improvements to the pattern matching semantics. Pattern matching now supports NA and NULL directly. This is particularly useful for programmatic control when a specific function signature is required but the argument value is non-deterministic. This can happen when accessing non-existent list elements as well. Suppose you want to forecast a time series. You want to choose the forecasting method based on whether the data is seasonal or not. A classification technique is used for this purpose and sets the period or NULL if it is not seasonal.</p>
<p>Traditional imperative code would run the classification, check its output and then use a conditional to execute the seasonal or non-seasonal forecasting routine. A functional approach would use function clauses to control the flow.</p>
<pre class="brush: r; title: ; notranslate">
forecast_ts(x, NULL) %as% {
  # non-seasonal forecast
}

forecast_ts(x, period) %as% {
  # seasonal forecast
}

period &lt;- classify_ts(x) # NULL or numeric
forecast_ts(x, period)
</pre>
<p>Obviously the same thing can be accomplished using an explicit guard statement, but pattern matching has an elegant simplicity to it that efficiently communicates the intent of the logic.</p>
<p>Behind the scenes these are additional parse transforms that take into consideration the special nature of these constants (and how you test for them). At some point I want to generalize the parse transform machinery so anyone can develop their own set of transforms (just like in erlang).</p>
<h3>Introducing the EMPTY Pattern</h3>
<p>I&#8217;ve also introduced a new constant called EMPTY, which allows you to pattern match on empty lists and vectors (or anything with 0 length). This means recursive definitions and other iterative methods against vectors and lists work as expected.</p>
<pre class="brush: r; title: ; notranslate">
fold(f, EMPTY, acc) %as% acc
fold(f, x, acc) %as% { fold(f, x[-1], f(x[[1]], acc)) }

plus(x,y) %as% { x + y }

x &lt;- 2
n &lt;- 0:10
fold(plus, x^n/factorial(n), 0)
</pre>
<p>You can also capture situations where empty sets are the result of set operations using EMPTY. The clean declarative aspect of this notation makes your analytical code easier to understand by removing the overhead of data management and manipulation.</p>
<p>Full details and source are available at: <a href="https://github.com/muxspace/lambda.r">https://github.com/muxspace/lambda.r</a></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cartesianfaith.wordpress.com/690/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cartesianfaith.wordpress.com/690/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=cartesianfaith.wordpress.com&#038;blog=6379897&#038;post=690&#038;subd=cartesianfaith&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://cartesianfaith.wordpress.com/2013/03/06/lambda-r-1-1-1-released-and-introducing-the-empty-keyword/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/6b4bc36b8607b256e0ec8e320291802a?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">muxspace</media:title>
		</media:content>
	</item>
		<item>
		<title>Lambda.r 1.1.0 released</title>
		<link>http://cartesianfaith.wordpress.com/2013/01/25/lambda-r-1-1-0-released/</link>
		<comments>http://cartesianfaith.wordpress.com/2013/01/25/lambda-r-1-1-0-released/#comments</comments>
		<pubDate>Sat, 26 Jan 2013 02:26:36 +0000</pubDate>
		<dc:creator>Brian Lee Yung Rowe</dc:creator>
				<category><![CDATA[functional programming]]></category>
		<category><![CDATA[lambda.r]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[type variables]]></category>

		<guid isPermaLink="false">http://cartesianfaith.wordpress.com/?p=684</guid>
		<description><![CDATA[This is a quick post to announce lambda.r version 1.1.0 is released and available on CRAN.1 This release has a &#8230;<p><a href="http://cartesianfaith.wordpress.com/2013/01/25/lambda-r-1-1-0-released/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=cartesianfaith.wordpress.com&#038;blog=6379897&#038;post=684&#038;subd=cartesianfaith&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>This is a quick post to announce <a href="https://github.com/muxspace/lambda.r">lambda.r</a> version 1.1.0 is released and available on <a href="http://cran.r-project.org/web/packages/lambda.r/index.html">CRAN</a>.<sup>1</sup> This release has a handful of important new features and bug fixes:</p>
<ul>
<li>[new] Type variables in type constraints</li>
<li>[new] Auto-replacement of function clauses</li>
<li>[bug] Function types break in type constraints</li>
<li>[bug] Zero argument functions don&#8217;t dispatch properly</li>
</ul>
<h3>Type Variables</h3>
<p>I&#8217;ll discuss type variables in full in a separate post, but the basic idea is that you can retain polymorphism of functions by using type variables instead of concrete types. In other words, type variables define the relationship between arguments but not the actual type. Take for instance the Heaviside step function. This function will evaluate equally well for a float, integer, or double. (In R, these are all represented by numeric, so this is somewhat contrived). The output of the function is 0, 0.5 (if x == 0), or 1. Essentially the return type should match the input type.</p>
<pre class="brush: r; title: ; notranslate">

heaviside(n) %::% a : a
heaviside(n) %when% { n &lt; 0 } %as% 0
heaviside(0) %as% 0.5
heaviside(n) %as% 1

</pre>
<p>Suppose instead the output of the function is 0 or 1. We can represent the return value as a logical. We still don&#8217;t care about the input type, so we can define the type constraint as</p>
<pre class="brush: r; title: ; notranslate">

heaviside(n) %::% a : logical
heaviside(n) %when% { n &lt;= 0 } %as% FALSE
heaviside(n) %as% TRUE

</pre>
<p>In short, type variables are another tool to manage how functions are dispatched. Used in conjunction with concrete types, you can achieve generality while preserving granularity.</p>
<h3>Auto-Replacement of Function Clauses</h3>
<p>Before this release, overwriting a specific function clause required either sealing the definition or deleting it from the environment. If you skipped this step then new function clauses would continue to be appended to the function. Not only was this annoying it also prevented you from interactively modifying function clauses. Lambda.r is now smart enough to recognize existing function clauses and replace the specific clause.</p>
<pre>&gt; fib(0) %as% 5
&gt; fib(1) %as% 2
&gt; fib(n) %as% { fib(n-1) + fib(n-2) }
&gt;
&gt; fib(0) %as% 1
&gt; fib(1) %as% 1
&gt; 
&gt; fib(5)
[1] 8</pre>
<p>The one exception is when there are two function signatures only differentiated by type. In this situation, lambda.r has no way of knowing which clause to replace. The solution is that there is always one type constraint in scope. Hence any ties will be resolved by the type constraint that is in scope. To set the type constraint that is in scope for a function, simply redeclare the type constraint. Let&#8217;s define a simple function generator that multiplies an input by some number.</p>
<pre class="brush: r; title: ; notranslate">
times.n(n) %::% numeric : Function
times.n(n=1) %as% { function(x) x + n }

times.n(n) %::% character : Function
times.n(n) %as% { times.n(as.numeric(n)) }
</pre>
<p>We source this and try it out for the default case.</p>
<pre>&gt; f &lt;- times.n()
&gt; f(4)
[1] 4</pre>
<p>All good, so let&#8217;s check it out with a different multiplier.</p>
<pre>&gt; f &lt;- times.n(2)
&gt; f(4)
[1] 6</pre>
<p>Whoops, it looks like we have a bug. We need to update the first function clause to use * instead of +. Since the two clauses have the same function signature we need to tell lambda.r which type constraint is in scope.</p>
<pre>&gt; times.n(n) %::% numeric:Function
&gt; times.n(n) %as% { function(x) x * n }
&gt; f &lt;- times.n(2)
&gt; f(4)
[1] 8</pre>
<p>Hence functions can be interactively modified as well as re-sourced with the same behavior. It does imply that if you use type constraints in a function definition, then you need to use them consistently in that function definition.</p>
<p>In some ways auto-replace should merely produce a yawn as a reaction. This is because the behavior is what you expect anyway. While the implementation is non-trivial, my hope is that it is an obvious, almost trivial feature.</p>
<p><sup>1</sup> There are actually two versions 1.1.0-2, which supports the 2.15.x R series and 1.1.0-3, which is compatible with the 3.0.x R series. Selection of versions should be automatic. Once R 3 is released in the spring, I plan on supporting the 2.x series for the remainder of the year.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cartesianfaith.wordpress.com/684/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cartesianfaith.wordpress.com/684/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=cartesianfaith.wordpress.com&#038;blog=6379897&#038;post=684&#038;subd=cartesianfaith&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://cartesianfaith.wordpress.com/2013/01/25/lambda-r-1-1-0-released/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/6b4bc36b8607b256e0ec8e320291802a?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">muxspace</media:title>
		</media:content>
	</item>
		<item>
		<title>The myth of the missing Data Scientist</title>
		<link>http://cartesianfaith.wordpress.com/2013/01/07/the-myth-of-the-missing-data-scientist/</link>
		<comments>http://cartesianfaith.wordpress.com/2013/01/07/the-myth-of-the-missing-data-scientist/#comments</comments>
		<pubDate>Tue, 08 Jan 2013 00:28:42 +0000</pubDate>
		<dc:creator>Brian Lee Yung Rowe</dc:creator>
				<category><![CDATA[engineering]]></category>
		<category><![CDATA[management]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[computational engineering]]></category>
		<category><![CDATA[data science]]></category>

		<guid isPermaLink="false">http://cartesianfaith.wordpress.com/?p=625</guid>
		<description><![CDATA[Much has been said about the dire shortage of Data Scientists looming on the horizon. With the spectre of Big &#8230;<p><a href="http://cartesianfaith.wordpress.com/2013/01/07/the-myth-of-the-missing-data-scientist/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=cartesianfaith.wordpress.com&#038;blog=6379897&#038;post=625&#038;subd=cartesianfaith&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Much has been said about the dire shortage of Data Scientists looming on the horizon. With the spectre of Big Data casting shadows over every domain, it would seem we need nothing short of a caped wonder to help us see the light. Heralded as superheroes, Data Scientists will swoop into an organization and free the Lois Lane of latent knowledge from the cold clutches of Big Data. In the end the enterprise bystanders will marvel at the amazing powers these superhumans possess. Everyone will be happy and the Data Scientist will get the girl.</p>
<p>It&#8217;s a great story and a great time to be a nerd. As much as I want to believe in this story, I just don&#8217;t buy it. True there is more data being produced now than ever before. The rate of data production is growing exponentially and people need to be able to analyze this data. Yet this dire need feels manufactured. The promoters of Data Science point to the <a href="http://www.mckinsey.com/insights/mgi/research/technology_and_innovation/big_data_the_next_frontier_for_innovation">McKinsey study</a> that cites a &#8220;shortage of 140,000 &#8211; 190,000 people with deep analytical skills&#8221; by 2018. That&#8217;s a lot of Data Scientists! Some people claim that <a href="http://gigaom.com/data/why-data-scientists-matter-data-science-is-the-future-of-everything/">every organization will eventually need at least one Data Scientist</a> and perhaps even have their own department. This all sounds fantastic (who wouldn&#8217;t want a legion of super-nerds be a force in culture?) except there are some serious problems with this analysis. There are three significant problems with the hyperbole surrounding Data Science: selection bias, assimilation blindness, and automation blindness. What we&#8217;ll see is that the need for Data Scientists is likely smaller than advertised with a startlingly short half-life.</p>
<h4>Selection Bias</h4>
<p>The first problem is that people assume that all 150k &#8220;people with deep analytical skills&#8221; are all Data Scientists. First let&#8217;s look at the math. Suppose every organization does need at least one Data Scientist. We start with the number of public companies listed on major exchanges in the US as a proxy for &#8220;every organization&#8221;, which is <a href="http://www.cfo.com/article.cfm/14570187">about 5000</a>. Why is this a reasonable proxy? Because smaller companies probably don&#8217;t have the budget to support full time data scientists. Adding businesses listed in OTC markets, we can roughly double that number. Fine, so let&#8217;s say 10000 companies. Then on average that would mean each organization has a team of 15 Data Scientists. Wow, I see a lot of dollar signs piling up alongside the map-reduce queries.</p>
<p>Clearly there must be other professions that require analytical skills that aren&#8217;t Data Scientists. Look at the cross section of people that use R and you&#8217;ll see people in Psychology, Economics, Biology, Finance, etc. The biggest population by far is the traditional group you think of when you think analysis: engineering. McKinsey hints at this when they list the Internet of Things as being one of the sources for the exponential growth in data. This version of the future, <a href="http://www.zdnet.com/giant-general-electric-says-it-needs-silicon-valley-7000008542/">popularized by GE</a>, points to <a title="Datacentric product development and the rebirth of engineering" href="http://cartesianfaith.wordpress.com/2012/11/17/datacentric-product-development-and-the-rebirth-of-engineering/">Computational Engineers</a> as filling most of this population. When GE alone is <a href="http://bits.blogs.nytimes.com/2011/11/21/81057/">hiring 400 people to fill one development center,</a> it&#8217;s plausible that the net shortage could reach hundreds of thousands.</p>
<h4>Assimilation Blindness</h4>
<p>The next problem is what I call assimilation blindness. Even if a shortage of this scale did exist for Data Scientists, it wouldn&#8217;t be sustained. As understanding of Big Data and analytical methods becomes more widespread, the need for specialists will often diminish. A good example is how web developers used to be a prized resource but are now commoditized since even High School students can build web sites (or iPhone apps for that matter). Data Scientists will find that their role will be assimilated quickly since their role only differs from traditional roles by having a big data component. What is the role of a Data Scientist? It is still up for debate, but here are some of the most popular themes I&#8217;ve seen:</p>
<ul>
<li><a href="http://www.greenplum.com/blog/topics/data-for-good/how-can-data-science-serve-the-public-good">Telling stories with data</a> (including visualization) &#8211; This is what marketers do. As tools become easier to use and analytical methods more pervasive, presumably many people in the marketing department will know how to take advantage of these tools directly rather than relying on a Data Scientist</li>
<li><a href="http://www.boston.com/jobs/news/jobdoc/2012/12/data_scientist_mines_informati.html">Finding insights in data</a> &#8211; This is what business analysts do. They&#8217;ve been trained to use analytical tools for years and know how to spot interesting phenomena in data. The tool set is different as is the scale of the data, but given most business analysts know a little SQL and basic statistics, it isn&#8217;t a stretch to conclude that they would assimilate many of the functions a Data Scientist fills</li>
<li><a href="http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/ar/1">Creating products from data</a> &#8211; This is what product managers do. In finance there are plenty of data products, and they aren&#8217;t managed or invented by Data Scientists. As data products become more mainstream, more people in the product management arena will know how to ask questions of data directly because they will have learned these skills themselves</li>
</ul>
<p>Hence while there may be a shortage in the short term, over time the Data Scientist will lose his cape and disappear into the crowd.</p>
<h4>Automation Blindness</h4>
<p>The functions of a Data Scientist that aren&#8217;t assimilated will likely be automated away. Not recognizing this phenomenon is what I call automation blindness. Numerous startups and big players such as <a href="http://www.ibmbigdatahub.com/blog/addressing-big-data-skills-gap">IBM</a> are developing tools to simplify big data analysis. Currently a big portion of a Data Scientist&#8217;s role is bringing together data from disparate sources to make an analysis possible. Once this is automated, the need for specialists will again decline.</p>
<p>In short the shortage of Data Scientists is shrouded in the myths of storytellers. There is definitely a need for people with analytical skills, and we will see this separate into skills that are generally assimilated and advanced skills used by engineers to design tools and systems that rely on data for their proper function.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cartesianfaith.wordpress.com/625/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cartesianfaith.wordpress.com/625/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=cartesianfaith.wordpress.com&#038;blog=6379897&#038;post=625&#038;subd=cartesianfaith&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://cartesianfaith.wordpress.com/2013/01/07/the-myth-of-the-missing-data-scientist/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/6b4bc36b8607b256e0ec8e320291802a?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">muxspace</media:title>
		</media:content>
	</item>
		<item>
		<title>Infinite generators in R</title>
		<link>http://cartesianfaith.wordpress.com/2013/01/05/infinite-generators-in-r/</link>
		<comments>http://cartesianfaith.wordpress.com/2013/01/05/infinite-generators-in-r/#comments</comments>
		<pubDate>Sat, 05 Jan 2013 17:00:54 +0000</pubDate>
		<dc:creator>Brian Lee Yung Rowe</dc:creator>
				<category><![CDATA[functional programming]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[simulation]]></category>
		<category><![CDATA[infinite data structures]]></category>
		<category><![CDATA[lambda.r]]></category>

		<guid isPermaLink="false">http://cartesianfaith.wordpress.com/?p=666</guid>
		<description><![CDATA[This is first in a series of posts about creating simulations in R. As a foundational discussion, I first look &#8230;<p><a href="http://cartesianfaith.wordpress.com/2013/01/05/infinite-generators-in-r/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=cartesianfaith.wordpress.com&#038;blog=6379897&#038;post=666&#038;subd=cartesianfaith&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>This is first in a series of posts about creating simulations in R. As a foundational discussion, I first look at generators and how to create them in R. Note: If you are following along, all the examples rely on lambda.r, so be sure to have that installed (from CRAN) first. If you are not familiar with lambda.r, you can read the <a title="Functional programming with lambda.r" href="http://cartesianfaith.wordpress.com/2012/11/20/functional-programming-with-lambda-r/">introduction</a>.</p>
<p>Put simply a generator returns a function that can produce a sequence. They are common in Python and are defined as functions that &#8220;allow you to declare a function that behaves like an iterator, i.e. it can be used in a for loop.&#8221; [1]. In R for loops are generally avoided in favor of functional approaches that use the *apply suite of higher order functions. Generators are still useful under this paradigm. Haskell has a similar concept for producing infinite data structures based on lazy evaluation of functions [2]. In short generators are useful because they allow you to programmatically construct a sequence.</p>
<h3>Construction</h3>
<p>Using closures it is easy to emulate this behavior in R. There are two key ingredients to make this work: a generator function and a new apply function that applies over an iterator (the return function of the generator).</p>
<h4>The generator</h4>
<p>At its most basic a generator is simply a function that returns a closure: a function bound to an environment that references non-local variables. Closures are a fundamental building block in functional programming and can eliminate the need for (dangerous) global variables. Our iterator is the closure and will reference a number of non-local variables defined in the scope of the generator.</p>
<pre class="brush: r; title: ; notranslate">
seq.gen(start) %as%
{
  value &lt;- start - 1
  function() {
    value &lt;&lt;- value + 1
    return(value)
  }
}
</pre>
<p>For simplicity the first generator is infinite: it will continue producing values for as long as it is called. Let&#8217;s think about that for a moment. We can produce an infinitely long sequence so long as we keep calling this function. In R we typically consider sequences as being finite. We also think about them being produced as a batch i.e. in a single function call. For example, creating a sequence from 1 to 10 is simply seq(1,10) or 1:10. The sequence is then passed to some other function as a vector. If passed to apply, it will then be iterated over element by element. This is fine for data analysis or batch-oriented back testing. However, what if instead of a batch we want to run a simulation as though the model or system were acting for real? This is where an iterator is useful because it can produce inputs that behave like real inputs.</p>
<h4>Introducing iapply</h4>
<p>Since the standard suite of apply functions expect a complete sequence, this technique cannot be used out of the box. Instead we need to create our own apply function, which we&#8217;ll call iapply (i for iterator). It acts like the other apply functions with the exception that it understands iterators.</p>
<pre class="brush: r; title: ; notranslate">
iapply(iterator, fn, simplify=TRUE, formatter=function(x) format(x,&quot;%Y-%m-%d&quot;)) %as%
{
  out &lt;- list()
  while (! is.null(input &lt;- iterator()))
  {
    df &lt;- data.frame(fn(input))
    if (ncol(df) &gt; 1)
      out[formatter(input)][[1]] &lt;- df
    else
      out[formatter(input)] &lt;- df
  }
  if (simplify) out &lt;- do.call(rbind,out)
  out
}
</pre>
<p>There is no magic in iapply. As shown it&#8217;s really just looping over the iterator, calling the function fn with the result of the iterator, doing some formatting, and finally collecting the result. The format function I use is for dates because my primary use case is to create a sequence of dates. I then simulate data over a sequence of dates and pass that into my system as though it were real data. The advantage is that I only write the model once, and I also don&#8217;t have to worry about accidentally using data in the past since model testing behaves exactly the same as real world operation.</p>
<h4>Embedding control into an iterator</h4>
<p>Since the current generator is infinite, the sequence will never stop. This means that iapply will never return. To resolve this minor detail, we need to add some control logic into the iterator. Remember that the iterator is a closure, so it is easy to add some more variables to the non-local scope and use that for control. The updated function provides a stop value, a step interval, and a way to reset the iterator back to the original starting value. There&#8217;s also an additional clause to handle character data and convert them to Date objects for convenience.</p>
<pre class="brush: r; title: ; notranslate">
seq.gen(start, stop, step=1) %when% {
  is.character(start)
  is.character(stop)
} %as% {
  seq.gen(as.Date(start), as.Date(stop), step)
}

seq.gen(start, stop=Inf, step=1) %as%
{
  first &lt;- value &lt;- start - step
  function(reset=FALSE) {
    if (reset) { value &lt;&lt;- first; return(invisible()) }
    if (value &gt;= stop) return(NULL)

    value &lt;&lt;- value + step
    return(value)
  }
}
</pre>
<p>As an example, we can now call the generator to provide a Date iterator. You can pass reset=TRUE at any time to reset the iterator to the first value.</p>
<pre>&gt; date.fn &lt;- seq.gen('2013-01-01', '2013-02-01')
&gt; date.fn()
[1] "2013-01-01"
&gt; date.fn()
[1] "2013-01-02"
&gt; date.fn()
[1] "2013-01-03"
&gt; date.fn(reset=TRUE)
&gt; date.fn()
[1] "2013-01-01"</pre>
<p>Keep in mind that once you hit the end of the iterator, it will return NULL unless you tell it to reset.</p>
<h4>A complete example</h4>
<p>The iterator is now ready to use within iapply. To use the technique, simply call the generator to create an instance and pass it along with a function to iapply.</p>
<pre>&gt; date.fn &lt;- seq.gen('2013-01-01', '2013-02-01')
&gt; iapply(date.fn, function(x) rnorm(1))
 [,1]
2013-01-01 -1.311821e+00
2013-01-02 4.112014e-01
2013-01-03 6.985409e-02
2013-01-04 3.905463e-01
<em id="__mceDel">...</em></pre>
<p>Admittedly using all this structure to generate a sequence of random numbers is counterproductive. In the next post, I&#8217;ll describe how to simulate stock price data using this approach and how to plug it into a model. It will then be clear what advantages this technique provides.</p>
<h3>Conclusion</h3>
<p>Generators are a powerful technique for programming a sequence. Armed with an iterable function it is possible to create simulations that operate realistically through the system. The advantage to this approach is that model development, system development, and testing can all share the same code base, meaning less work for you.</p>
<h2>References</h2>
<p>[1] <a href="http://wiki.python.org/moin/Generators">Generators</a><br />
[2] <a href="http://www.haskell.org/tutorial/functions.html">Functions</a></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cartesianfaith.wordpress.com/666/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cartesianfaith.wordpress.com/666/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=cartesianfaith.wordpress.com&#038;blog=6379897&#038;post=666&#038;subd=cartesianfaith&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://cartesianfaith.wordpress.com/2013/01/05/infinite-generators-in-r/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/6b4bc36b8607b256e0ec8e320291802a?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">muxspace</media:title>
		</media:content>
	</item>
		<item>
		<title>Confident package releases in R with crant</title>
		<link>http://cartesianfaith.wordpress.com/2012/11/29/confident-package-releases-in-r-with-crant/</link>
		<comments>http://cartesianfaith.wordpress.com/2012/11/29/confident-package-releases-in-r-with-crant/#comments</comments>
		<pubDate>Thu, 29 Nov 2012 14:17:28 +0000</pubDate>
		<dc:creator>Brian Lee Yung Rowe</dc:creator>
				<category><![CDATA[infrastructure]]></category>
		<category><![CDATA[package management]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://cartesianfaith.wordpress.com/?p=648</guid>
		<description><![CDATA[I recently released the new lambda.r package on CRAN for functional programming. This was my first new package in quite some &#8230;<p><a href="http://cartesianfaith.wordpress.com/2012/11/29/confident-package-releases-in-r-with-crant/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=cartesianfaith.wordpress.com&#038;blog=6379897&#038;post=648&#038;subd=cartesianfaith&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>I recently released the new <a title="Functional programming with lambda.r" href="http://cartesianfaith.wordpress.com/2012/11/20/functional-programming-with-lambda-r/">lambda.r</a> package on <a href="http://cran.r-project.org/web/packages/lambda.r/index.html">CRAN</a> for functional programming. This was my first new package in quite some time, and I forgot how onerous package releases to CRAN can be.</p>
<p>What I didn&#8217;t realize is that packages are tested against <a href="http://cran.r-project.org/doc/manuals/R-admin.html#Getting-patched-and-development-versions">three distinct versions of R</a>:</p>
<ul>
<li>R &#8211; The latest official point release (e.g. 2.15)</li>
<li>R-patched &#8211; The latest patch release (e.g. 2.15.2)</li>
<li>R-devel &#8211; The latest source in the development branch</li>
</ul>
<p>It just so happened that a dependency of lambda.r failed in the R-devel branch. The update then broke my package but only for one of the builds.</p>
<p>Most of the time worrying about these slightly different versions isn&#8217;t a big deal, but the more low-level a package is the more you have to worry about it. Lambda.R is one such package, and I was stuck in a curious situation where the CRAN maintainers were telling me the package was inconsistently failing. Joyriding aside, I&#8217;m not one for driving blind. Instead I created some tools to provide the coverage I needed, so I can release packages with confidence and not waste anybody&#8217;s time. The collection of these tools is called <a href="https://github.com/muxspace/crant">crant</a>.</p>
<h2>Features</h2>
<p>Crant does just three things. Starting from scratch you can:</p>
<ul>
<li>Create the three standalone R builds mentioned above. Each version is downloaded from source and can be updated at any time</li>
<li>Install 3rd party libraries for each build (e.g. RUnit, testthat). Basically any package that is &#8220;Suggested&#8221; but not &#8220;Required&#8221; will need to be installed prior to building your own package</li>
<li>Build and install your package. The rant script will also set the package version and perform the CRAN checks as necessary.</li>
</ul>
<p>The key to crant is a consistent build chain for any version of R. This flexibility means that you can test your package against any set of R sources (not just the three tags above), depending on how much backwards compatibility you require.</p>
<h2>Usage</h2>
<p>Here is a quick guide to crant. More documentation is available on the <a href="https://github.com/muxspace/crant">source page</a>.</p>
<h3>Building the Environment</h3>
<p>The buildenv.sh script can setup a clean OS with all the tools, such as make, gcc, gfortran, java, etc, necessary to build R from source. Note that only debian-flavored OSes are compatible. Include the -d option to install these dependencies (only do this once).</p>
<pre class="brush: r; title: ; notranslate">
export PATH=$PATH:path/to/crant
buildenv.sh -u # Add -d if you have a clean OS
</pre>
<p>The -u option tells the script to get the R source and update it to the latest version. It will also build the R source and set up the package directory to something portable and safe from the other builds.</p>
<p>The end result is that you will have 3 installations of R built from source that correspond to the latest minor release (e.g. 2.15), the latest patch release (e.g. 2.15.2), and the current development source (R-devel).</p>
<h3>Install Package Dependencies</h3>
<p>Now armed with three R builds, you can install packages into each one. For fine-grained control, you need to do this with each version of R. I may wrap this up into a single command if there is interest.</p>
<pre class="brush: r; title: ; notranslate">
setuplib.sh -R ~/devel/bin/R RUnit testthat
setuplib.sh -R ~/patch/bin/R RUnit testthat
setuplib.sh -R ~/release/bin/R RUnit testthat
</pre>
<p>Note that you only need to install packages that are Suggested. Required dependencies should be installed automatically during the package build process.</p>
<h3>Build Your Package</h3>
<p>At this point your environment is fully configured. You can now test your package against these R builds. The rant script will build, run tests, and check for CRAN compatibility. It will also optionally install the package to the R build running.</p>
<pre class="brush: r; title: ; notranslate">
rant -v 1.0.0 -R path/to/R your.package
</pre>
<p>Note that rant will automatically set the version and date in the DESCRIPTION and package.Rd files for you. This works if you use the placeholders &#8216;{version}&#8217; and &#8216;{date}&#8217;, respectively, in these files.</p>
<p>During package development, you can run the rant script whenever you want to test the integrity of the package. You will find the source packages in a directory called &#8216;export&#8217;. These can be uploaded to CRAN.</p>
<p>To summarize, anyone writing packages should take a look at crant. Having a consistent and easily reproducible build chain can greatly improve the success ratio of a CRAN upload in addition to making the process more efficient.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cartesianfaith.wordpress.com/648/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cartesianfaith.wordpress.com/648/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=cartesianfaith.wordpress.com&#038;blog=6379897&#038;post=648&#038;subd=cartesianfaith&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://cartesianfaith.wordpress.com/2012/11/29/confident-package-releases-in-r-with-crant/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/6b4bc36b8607b256e0ec8e320291802a?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">muxspace</media:title>
		</media:content>
	</item>
		<item>
		<title>Functional programming with lambda.r</title>
		<link>http://cartesianfaith.wordpress.com/2012/11/20/functional-programming-with-lambda-r/</link>
		<comments>http://cartesianfaith.wordpress.com/2012/11/20/functional-programming-with-lambda-r/#comments</comments>
		<pubDate>Tue, 20 Nov 2012 21:50:16 +0000</pubDate>
		<dc:creator>Brian Lee Yung Rowe</dc:creator>
				<category><![CDATA[functional programming]]></category>
		<category><![CDATA[lambda.r]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[system design]]></category>
		<category><![CDATA[attributes]]></category>
		<category><![CDATA[guards]]></category>
		<category><![CDATA[pattern matching]]></category>
		<category><![CDATA[types]]></category>

		<guid isPermaLink="false">http://cartesianfaith.wordpress.com/?p=610</guid>
		<description><![CDATA[After a four month simmer on various back burners and package conflicts, I&#8217;m pleased to announce that the successor to &#8230;<p><a href="http://cartesianfaith.wordpress.com/2012/11/20/functional-programming-with-lambda-r/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=cartesianfaith.wordpress.com&#038;blog=6379897&#038;post=610&#038;subd=cartesianfaith&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>After a four month simmer on various back burners and package conflicts, I&#8217;m pleased to announce that the successor to futile.paradigm is officially available on CRAN. The new package is <a href="http://cran.r-project.org/web/packages/lambda.r/index.html">lambda.r</a> (source on <a href="https://github.com/muxspace/lambda.r">github</a>), which hopefully conveys the purpose of the library better than its whimsical predecessor. In some ways this new version deserves a more serious name as the package has matured quite a bit not to mention is part and parcel of a book I&#8217;m writing on computational systems and functional programming.</p>
<p>So what exactly is lambda.r? Put simply, lambda.r gives you functional programming in the R language. While R has many functional features built into the language, application development in R is a decidedly object-oriented affair. I won&#8217;t go into all the reasons why it&#8217;s better to write computational systems in a functional paradigm since that is covered in depth in my forthcoming book &#8220;Computational Finance and the Lambda Calculus&#8221;. However, here are the salient points:</p>
<ul>
<li>Conceptual consistency with mathematics resulting in less translation error between model and system (see my slides from <a href="http://www.rinfinance.com/agenda/2011/BrianRowe.pdf">R/Finance 2011</a>)</li>
<li>Modular and encapsulated architecture that makes growth of a system easier to manage (not to mention easier to accommodate disparate computing needs &#8212; think parallel alternatives of the same processing pipeline)</li>
<li>Efficiency in application development since wiring is trivial</li>
</ul>
<h2>Features</h2>
<p>The fundamental goal of lambda.r is to provide a solid architectural foundation that remains intact through the prototyping and development phases of a model or application. One half is accomplished with a functional syntax that builds in modularity and encapsulation into every function. The other half is through the incremental adoption of constraints in the system. This article will focus primarily on the features, and in a separate article I will outline how to best leverage these features as a system matures.</p>
<h3>Pattern Matching</h3>
<p>Functional languages often use pattern matching to define functions in multiple parts. This syntax is reminiscent of sequences or functions with initial values in addition to multi-part definitions. Removing control flow from function definitions makes functions easier to understand and reduces the translation error from math to code.</p>
<h4>Fibonacci Sequence</h4>
<p>For example, the ubiquitous Fibonacci sequence is defined as</p>
<p><img src='http://s0.wp.com/latex.php?latex=F_%7Bn%7D+%3D+F_%7Bn-1%7D+%2B+F_%7Bn-2%7D+&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='F_{n} = F_{n-1} + F_{n-2} ' title='F_{n} = F_{n-1} + F_{n-2} ' class='latex' />, where <img src='http://s0.wp.com/latex.php?latex=F_%7B1%7D+%3D+F_%7B2%7D+%3D+1+&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='F_{1} = F_{2} = 1 ' title='F_{1} = F_{2} = 1 ' class='latex' /></p>
<p>In standard R, one way to define this is with an if/else control block or function [1].</p>
<pre class="brush: r; title: ; notranslate">
fib &lt;- function(n)
{
  ifelse(n &lt; 2, 1, fib(n - 1) + fib(n - 2))
}
</pre>
<p>Using lambda.r, pattern matching defines the function in three clauses. The behavior of the function is free of clutter as each clause is self-contained and self-explanatory.</p>
<pre class="brush: r; title: ; notranslate">
fib(0) %as% 1
fib(1) %as% 1
fib(n) %as% { fib(n-1) + fib(n-2) }
</pre>
<h4>Heaviside Step Function</h4>
<p>When represented as a piecewise constant function, the Heaviside step function is defined in three parts. [2]</p>
<p><img src='http://s0.wp.com/latex.php?latex=H%28x%29+%3D+%5Cbegin%7Bcases%7D++++0+%26+x+%3C+0+%5C%5C++++1%2F2+%26+x+%3D+0+%5C%5C++++1+%26+x+%3E+0++++%5Cend%7Bcases%7D+&amp;bg=ffffff&amp;fg=000&amp;s=0' alt='H(x) = &#92;begin{cases}    0 &amp; x &lt; 0 &#92;&#92;    1/2 &amp; x = 0 &#92;&#92;    1 &amp; x &gt; 0    &#92;end{cases} ' title='H(x) = &#92;begin{cases}    0 &amp; x &lt; 0 &#92;&#92;    1/2 &amp; x = 0 &#92;&#92;    1 &amp; x &gt; 0    &#92;end{cases} ' class='latex' /></p>
<p>Using pattern matching in lambda-r, the function can be defined almost verbatim.</p>
<pre class="brush: r; title: ; notranslate">
h.step(n) %when% { n &lt; 0 } %as% 0
h.step(0) %as% 0.5
h.step(n) %as% 1
</pre>
<p>In languages that don&#8217;t support pattern matching, again if/else control structures are used to implement these sorts of functions, which can get complicated as more cases are added to a function. A good example of this is the &#8216;optim&#8217; function in R, which embeds a number of cases within the function definition.</p>
<h3>Guard Statements</h3>
<p>The last example sneaks in a guard statement along with pattern matching. Guards provide a rich vocabulary to control when a specific function clause is executed. Each guard statement is a logical expression. Multiple expressions can be present in a guard block, so that the function clause only executes when all the expressions evaluate to TRUE. Using the Fibonacci example above, we can add an argument check to only allow integers.</p>
<pre class="brush: r; title: ; notranslate">
fib(0) %as% 1
fib(1) %as% 1
fib(n) %when% {
  is.integer(n)
  n &gt; 1
} %as% { fib(n-1) + fib(n-2) }
</pre>
<p>If none of the clauses are satisfied, lambda.r will complain telling you that it couldn&#8217;t find a matching function clause.</p>
<pre>&gt; fib(2)
Error in UseFunction("fib", ...) : No valid function for 'fib(2)'
&gt; fib(as.integer(2))
[1] 2</pre>
<p>Note: If you are running the examples as you are reading along, then you need to either seal() the functions or rm() the current definition prior to redefining the function. The reason is that function clauses are additive. You can add as many clauses as you want, and they will be evaluated in the order they were declared. Since lambda.r has no way of knowing when you are done defining your function you must explicitly tell it via the seal() function.</p>
<h3>Types</h3>
<p>Custom types can be defined in lambda.r. These types can be used in type constraints to provide type safety and distinguish one function clause from another. All types must be defined using PascalCase.</p>
<h4>Type Constructors</h4>
<p>A type constructor is simply a function that creates a type. The name of the function is the name of the type. The return value will automatically be typed while also preserving existing type information. This means that you can create type hierarchies as needed.</p>
<pre class="brush: r; title: ; notranslate">
Point(x,y) %as% list(x=x,y=y)
Polar(r,theta) %as% list(r=r,theta=theta)
</pre>
<p>In this example we use a list as the underlying data structure. To create an instance of this type simply call the constructor.</p>
<pre class="brush: r; title: ; notranslate">
point.1 &lt;- Point(2,3)
point.2 &lt;- Point(5,7)
</pre>
<p>Under the hood lambda.r leverages the S3 class mechanism, which means that lambda.r types are compatible with S3 classes.</p>
<h4>Type Constraints</h4>
<p>Types by themselves aren&#8217;t all that interesting. Once we define the types, they can be used as constraints on a function.</p>
<pre class="brush: r; title: ; notranslate">
distance(a,b) %::% Point : Point : numeric
distance(a,b) %as% { ((a$x - b$x)^2 + (a$y - b$y)^2)^.5 }

distance(a,b) %::% Polar : Polar : numeric
distance(a,b) %as%
{
  (a$r^2 + b$r^2 - 2 * a$r * b$r * cos(a$theta - b$theta))^.5
}
</pre>
<p>As shown above each function clause can have its own constraint. Since type constraints are greedy, a declared constraint will apply to every successive function clause until a new type constraint is encountered.</p>
<pre>&gt; distance(point.1, point.2)
[1] 5</pre>
<h3>Attributes</h3>
<p>Types are great for adding structure and safety to an application. However types can have diminishing returns as more types are introduced. In general lambda.r advocates using existing data structures where possible to minimize type clutter. Of course if data.frames and matrices are used for most operations, how do you differentiate function clauses? The answer of course are attributes, which come standard with R. Attributes can be considered meta-data that is orthogonal to the core data structure. They are preserved during operations, so can be carried through a process. Lambda.r makes working with attributes so easy that they should become second nature fairly quickly.</p>
<p>With lambda.r you can access attributes via the &#8216;@&#8217; symbol. Define them in a type constructor as shown below.</p>
<pre class="brush: r; title: ; notranslate">
Temperature(x, system='metric', units='celsius') %as%
{
  x@system &lt;- system
  x@units &lt;- units
  x
}
</pre>
<p>Function clauses can now be defined based on the value of an attribute.</p>
<pre class="brush: r; title: ; notranslate">
freezing(x) %::% Temperature : logical
freezing(x) %when% {
  x@system == 'metric'
  x@units == 'celsius'
} %as% {
  if (x &lt; 0) { TRUE }
  else { FALSE }
}

freezing(x) %when% {
  x@system == 'metric'
  x@units == 'kelvin'
} %as% {
  if (x &lt; 273) { TRUE }
  else { FALSE }
}
</pre>
<p>It is trivial then to check whether a given temperature is freezing, based on the units. This approach can be extended to objects like covariance matrices to preserve information that is normally lost in the creation of the matrix (e.g. number of observations).</p>
<pre class="brush: r; title: ; notranslate">
ctemp &lt;- Temperature(20)
freezing(ctemp)
</pre>
<p>Note that the Temperature type extends the type of &#8216;x&#8217;, so it is also a numeric value. This means that you can add a scalar to a Temperature object, and everything behaves as you would expect.</p>
<pre>&gt; ctemp1 &lt;- ctemp - 21
&gt; freezing(ctemp1)
[1] TRUE</pre>
<p>The combination of types and attributes are two essential tools in the lambda.r toolkit. In this section I&#8217;ve also illustrated how S3 classes can naturally be mixed and matched with lambda.r classes.</p>
<h3>Introspection</h3>
<p>The goal of lambda.r is to provide rich functionality with a simple and intuitive syntax. To accomplish this there is a lot of wiring behind the scenes. While most of the implementation can safely be ignored, there are times when it is necessary to look under the hood for troubleshooting purposes. Lambda.r provides a number of tools to make debugging and introspection as simple as possible.</p>
<p>The default output of a lambda.r function or type gives a summary view of the function clauses associated with this function. This is an abridged view to prevent long code listings from obscuring the high level summary. Any type constraints and guard statements are included in this display as well as default values.</p>
<pre>&gt; freezing
&lt;function&gt;
[[1]]
freezing(x) %::% Temperature:logical 
freezing(x) %when% {
  x@system == "metric"
  x@units == "celsius"
} %as% ...
[[2]]
freezing(x) %::% Temperature:logical 
freezing(x) %when% {
  x@system == "metric"
  x@units == "kelvin"
} %as% ...</pre>
<p>Index values prefix each function clause. Use this key when looking up the definition of an explicit function clause with the &#8216;describe&#8217; function.</p>
<pre>&gt; describe(freezing,2)
function(x) { if ( x &lt; 273 ) {
TRUE
}
else {
FALSE
} }
&lt;environment: 0x7f8cfcd7de60&gt;</pre>
<h4>Debugging</h4>
<p>Since lambda.r implements its own dispatching function (UseFunction), you cannot use the standard &#8216;debug&#8217; function to debug a function clause. Instead use the supplied &#8216;debug.lr&#8217; and &#8216;undebug.lr&#8217;. These functions will allow you to step through any of the function clauses within a lambda.r function.</p>
<h2>Examples</h2>
<p>All examples are in the source package as unit tests. Below are some highlights to give you an idea of how to use the package.</p>
<h3>Prices and Returns</h3>
<p>This example shows how to use attributes to limit the scope of a function for specific types of data. Note that the definition of Prices makes no restriction on series, so this definition is compatible with a vector or data.frame as the underlying data structure.</p>
<pre class="brush: r; title: ; notranslate">
Prices(series, asset.class='equity', periodicity='daily') %as%
{
  series@asset.class &lt;- asset.class
  series@periodicity &lt;- periodicity
  series
}

returns(x) %when% {
  x@asset.class == &quot;equity&quot;
  x@periodicity == &quot;daily&quot;
} %as% {
  x[2:length(x)] / x[1:(length(x) - 1)] - 1
}
</pre>
<h3>Taylor Approximation</h3>
<p>This is a simple numerical implementation of a Taylor approximation.</p>
<pre class="brush: r; title: ; notranslate">
fac(1) %as% 1
fac(n) %when% { n &gt; 0 } %as% { n * fac(n - 1) }

d(f, 1, h=10^-9) %as% function(x) { (f(x + h) - f(x - h)) / (2*h) }
d(f, 2, h=10^-9) %as% function(x) { (f(x + h) - 2*f(x) + f(x - h)) / h^2 }

taylor(f, a, step=2) %as% taylor(f, a, step, 1, function(x) f(a))
taylor(f, a, 0, k, g) %as% g
taylor(f, a, step, k, g) %as% {
  df &lt;- d(f,k)
  g1 &lt;- function(x) { g(x) + df(a) * (x - a)^k / fac(k) }
  taylor(f, a, step-1, k+1, g1)
}
</pre>
<p>Use the following definitions like so:</p>
<pre>&gt; f &lt;- taylor(sin, pi)
&gt; v &lt;- f(3.1)</pre>
<h2>References</h2>
<p>[1] <a href="http://www.johnmyleswhite.com/notebook/2012/03/31/julia-i-love-you/">Julia, I Love You</a></p>
<p>[2] <a href="http://mathworld.wolfram.com/HeavisideStepFunction.html">Heaviside Step Function</a></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cartesianfaith.wordpress.com/610/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cartesianfaith.wordpress.com/610/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=cartesianfaith.wordpress.com&#038;blog=6379897&#038;post=610&#038;subd=cartesianfaith&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://cartesianfaith.wordpress.com/2012/11/20/functional-programming-with-lambda-r/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/6b4bc36b8607b256e0ec8e320291802a?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">muxspace</media:title>
		</media:content>
	</item>
		<item>
		<title>Datacentric product development and the rebirth of engineering</title>
		<link>http://cartesianfaith.wordpress.com/2012/11/17/datacentric-product-development-and-the-rebirth-of-engineering/</link>
		<comments>http://cartesianfaith.wordpress.com/2012/11/17/datacentric-product-development-and-the-rebirth-of-engineering/#comments</comments>
		<pubDate>Sat, 17 Nov 2012 14:30:57 +0000</pubDate>
		<dc:creator>Brian Lee Yung Rowe</dc:creator>
				<category><![CDATA[engineering]]></category>
		<category><![CDATA[management]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://cartesianfaith.wordpress.com/?p=630</guid>
		<description><![CDATA[An old irony in New York is the ubiquity of the &#8216;gourmet deli&#8217;. It is hard to find a deli &#8230;<p><a href="http://cartesianfaith.wordpress.com/2012/11/17/datacentric-product-development-and-the-rebirth-of-engineering/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=cartesianfaith.wordpress.com&#038;blog=6379897&#038;post=630&#038;subd=cartesianfaith&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>An old irony in New York is the ubiquity of the &#8216;gourmet deli&#8217;. It is hard to find a deli that doesn&#8217;t proclaim to be gourmet. It is so commonplace that the word gourmet has lost all of its original meaning and perhaps taken on the opposite meaning. A similar phenomenon happened with the word engineer resulting in a dilution of its meaning. I recently heard someone describe themselves as a &#8216;domestic engineer&#8217;. While presumably tongue-in-cheek, the overuse and perhaps abuse of the term engineer has precipitated its loss of meaning and value. Other pithy examples include &#8216;sanitation engineer&#8217;, &#8216;sales engineer&#8217;, or &#8216;quality assurance engineer&#8217;.</p>
<p>Many will argue that title doesn&#8217;t really mean all that much and for the most part I agree. However being an engineer used to be more than a title: it also defined your role, your education, and your liability. I&#8217;m biased because I have a degree in Electrical Engineering and have gone through the rigors of the discipline culminating in the 8-hour long marathon Engineer In Training exam. One of my professors would state this disclaimer at the start of his Communications (where you design a wireless transmission network) class: &#8220;There are two things you won&#8217;t be able to do with a degree in Electrical Engineering: wire a house and fix a TV&#8221;. We all laughed at the time but it took time for the real message to sink in. His point was that engineering is a professional discipline that focuses on theory and design. We learned why things work and how to predict the behavior of systems we design based on first principles. It is not a technical trade that teaches you how to do perform specific functions. So we may not know how to wire a house, but it didn&#8217;t matter because under the hood we knew all the underlying theory.</p>
<p>The curious thing about engineering is that we often get confused as people who build things. The world of software has reinforced this stereotype, which I think needs to be addressed. At our core engineers are problem solvers: people who use mathematics, physics, and other scientific fields along with analytical skills to solve practical design problems. A civil engineer designs bridges and transforms building architectures into safe, viable structures. Electrical engineers design microchips, computers, wireless networks, power stations. Aeronautical engineers design aircraft. The list goes on and on. Note that in all these domains an engineer does not build these things as that is the domain of manufacturing or construction. Engineering is about design and prototyping that culminates in an explicit and detailed plan that describes how to build or manufacture the end product. The epitome of this philosophy comes from Qualcomm who revolutionized the integrated circuit (IC) business by focusing only on the IP surrounding the design of wireless ICs and eschewing manufacturing altogether. Other companies like ARM Holdings have since followed suit.</p>
<p>So where does that leave software? The challenge with software is that it is so widely adopted and the tools so easy that almost anyone can program these days (and in fact is now being taught to primary school students [1]). Knowing how to program is different from being a programmer, which is different from being an engineer. For example mathematics is taught throughout primary and secondary education but no one in their right mind would call themselves a mathematician unless they have a degree in mathematics. Most people I know with a math degree (including myself) do not refer to themselves as a mathematician. The same is true of physics: we learn a fair amount of physics but we don&#8217;t call ourselves physicists. The same really should apply to software: just because you know how to write software doesn&#8217;t mean you are a software engineer (a term I don&#8217;t particularly like). To be an engineer your understanding needs to be more than how to write software. You also need to understand data structures and algorithms, system design, resiliency and redundancy, analysis, numerical methods, etc. Without this foundation there is little guarantee that in production a system will operate at a level of reliability and accuracy that is expected of the system. This isn&#8217;t so different from a lay person building a house. While the house may stand and be livable, there is little analysis surrounding its structural stability, how much load it can bear, etc. I&#8217;m close to throwing out the baby with the bath water, so where does that leave us?</p>
<p>We are now truly in an information age, where even the most mundane products have become datacentric. This means that a product or service cannot exist without data to drive it. I&#8217;m not talking about Salesforce.com where you enter in data and retrieve it later. I&#8217;m talking search engines, mapping tools (Google Maps), recommendation systems (Amazon, Netflix), digital personal assistants (Siri), etc. I call this datacentric product development. Finance has been doing this for years under various guises. It is particularly prevalent in portfolio management, trading, and risk management, not to mention applications like creditworthiness or fraud detection. The term Financial Engineer embodies the role, which involves equal parts analysis, modeling, and development. These people are typically part of business groups and not part of the greater IT organization, which is usually designated a cost center. What is remarkable is that the role of Financial Engineer embodies this notion that being an engineer is about applying analytical methods to solve problems. The solutions just happen to be written in software. As datacentric product development becomes more widespread, being just a programmer will not be sufficient.</p>
<p>Organizationally I think this model is what technology product companies need to move towards. Product development needs more engineers and fewer programmers. What this means is that a single role must be responsible for understanding the business domain, assessing the problem, designing a solution, and illustrating that it works. The toolset of the engineer is more than just a Java IDE. It also includes working in languages like R or Matlab to analyze data and test hypotheses. In many cases this will also mean building a production system, while in other cases it may mean handing off to a vendor that specializes in building software. In some ways I&#8217;m trying to dial back the clock to the &#8220;good &#8216;ol days&#8221; but not for reasons of nostalgia. It&#8217;s really about recognizing what it means to be an engineer and what datacentric product development needs. (In a separate article I&#8217;ll share my thoughts on what I think an optimal organizational structure looks like when you adopt this model.)</p>
<p>The last question is what do we call this legion of skilled engineers? I&#8217;ve already professed my distaste for Software Engineer as I think this term confines the discipline to just software, which I&#8217;ve argued is only a small portion of the overall capability a person in this role needs. I could cheat since the organization I&#8217;m building is in the financial industry and stick with Financial Engineer but that still leaves an unnamed army to fend for themselves. Instead the emerging field of Computational Engineering [2] embodies the characteristics that I think datacentric product development needs. The reason is that the Computational Sciences are known for solving domain-specific problems using analytical and numerical methods, which of course includes writing software. People in these fields face the same challenges of those in product organizations: scaling, performance, reliability, accuracy of the model, etc. The term has not been widely adopted for Internet and business technology, but it is not difficult to see how it applies here as well.</p>
<p>If you&#8217;ve bought this argument and are interested in being a part of a new breed of engineer, don&#8217;t hesitate to contact me.</p>
<h3>References</h3>
<p>[1] <a href="http://www.independent.co.uk/news/education/schools/code-club-afterschool-group-teaches-children-how-to-become-programming-whizz-kids-7956967.html">Code Club: After-school group teaches children how to become programming whizz kids</a></p>
<p>[2] <a href="http://www.siam.org/students/resources/report.php">Graduate Education for Computational Science and Engineering</a></p>
<p>&nbsp;</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cartesianfaith.wordpress.com/630/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cartesianfaith.wordpress.com/630/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=cartesianfaith.wordpress.com&#038;blog=6379897&#038;post=630&#038;subd=cartesianfaith&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://cartesianfaith.wordpress.com/2012/11/17/datacentric-product-development-and-the-rebirth-of-engineering/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/6b4bc36b8607b256e0ec8e320291802a?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">muxspace</media:title>
		</media:content>
	</item>
		<item>
		<title>Preview of functional programming syntax for futile.paradigm 2.1</title>
		<link>http://cartesianfaith.wordpress.com/2012/07/09/preview-of-functional-programming-syntax-for-futile-paradigm-2-1/</link>
		<comments>http://cartesianfaith.wordpress.com/2012/07/09/preview-of-functional-programming-syntax-for-futile-paradigm-2-1/#comments</comments>
		<pubDate>Tue, 10 Jul 2012 02:20:30 +0000</pubDate>
		<dc:creator>Brian Lee Yung Rowe</dc:creator>
				<category><![CDATA[functional programming]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://cartesianfaith.wordpress.com/?p=569</guid>
		<description><![CDATA[I&#8217;m developing a streamlined syntax for the next release of futile.paradigm. While this version is backwards compatible, it introduces a &#8230;<p><a href="http://cartesianfaith.wordpress.com/2012/07/09/preview-of-functional-programming-syntax-for-futile-paradigm-2-1/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=cartesianfaith.wordpress.com&#038;blog=6379897&#038;post=569&#038;subd=cartesianfaith&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>I&#8217;m developing a streamlined syntax for the next release of futile.paradigm. While this version is backwards compatible, it introduces a cleaner syntax for functional programming. The noteworthy improvements are a more natural pattern matching syntax, integrated (optional) type checking in functions, and a cleaner syntax for type constructors. Here are some examples.</p>
<h3>Fibonacci Sequence</h3>
<p>This sequence is a good example of a recursive definition.</p>
<pre class="brush: r; title: ; notranslate">fib(0) %as% 1
fib(1) %as% 1
fib(n) %as% { fib(n-1) + fib(n-2) }</pre>
<h3>Kronecker Delta</h3>
<p>The unit impulse is a common multipart function.</p>
<pre class="brush: r; title: ; notranslate">delta.k(0) %as% 1
delta.k(x) %as% 0</pre>
<h3>Inverse</h3>
<p>This example is a little contrived but is illustrative of the type checking you can perform.</p>
<pre class="brush: r; title: ; notranslate">inverse(matrix x) %as% solve(x)
inverse(numeric x) %as% x^-1</pre>
<h3>Multiline Functions</h3>
<p>Clearly not all functions are one liners, so we support multiline definitions as well.</p>
<pre class="brush: r; title: ; notranslate">
fn(x, y) %as% {
  z &lt;- x + y
  z * 2
}</pre>
<h3>Type Constructors</h3>
<p>The syntax for type constructors in previous versions of futile.paradigm is a little klunky. With f.p 2.1 the definition for type constructors can collapse into the same syntax as regular. Creating these types could thus look like this:</p>
<pre class="brush: r; title: ; notranslate">
Portfolio(x) %as% {
  returns &lt;- get.returns(x)
  list(returns=returns)
}

p &lt;- Portfolio(my.prices)</pre>
<p>The one caveat is that this may break the %isa% function, although with direct type checking it may not be needed anymore.</p>
<h3>Notes</h3>
<p>More complicated expressions should continue to use the %when% and %also% syntax, although the new syntax should cover ~85% of use cases (based on my own usage).</p>
<p>At times the ellipsis would be useful in multipart functions. I haven&#8217;t come up with a way of integrating it that I&#8217;m satisfied with. It really comes down to making sure the syntax and interpretation is deterministic. The other problem is that the extra tests involved with the ellipsis could impact performance. So this one is a work in progress.</p>
<p>Default values are another convenience that I&#8217;m working out how to integrate. While I am a bid advocate of functional syntax, I think we should leverage the features that R has to offer. Again, this comes down to a deterministic evaluation that has good performance.</p>
<p>I&#8217;m toying around with some ideas for optimizations, particularly for tail recursion, but this is still in the idea stage.</p>
<p>Feel free to send me suggestions on syntax as I have not implemented this yet although I plan to in the next week or so. Any test cases would be great as well.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cartesianfaith.wordpress.com/569/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cartesianfaith.wordpress.com/569/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=cartesianfaith.wordpress.com&#038;blog=6379897&#038;post=569&#038;subd=cartesianfaith&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://cartesianfaith.wordpress.com/2012/07/09/preview-of-functional-programming-syntax-for-futile-paradigm-2-1/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/6b4bc36b8607b256e0ec8e320291802a?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">muxspace</media:title>
		</media:content>
	</item>
		<item>
		<title>Really simple replication in riak</title>
		<link>http://cartesianfaith.wordpress.com/2012/07/04/really-simple-replication-in-riak/</link>
		<comments>http://cartesianfaith.wordpress.com/2012/07/04/really-simple-replication-in-riak/#comments</comments>
		<pubDate>Wed, 04 Jul 2012 15:13:05 +0000</pubDate>
		<dc:creator>Brian Lee Yung Rowe</dc:creator>
				<category><![CDATA[erlang]]></category>
		<category><![CDATA[riak]]></category>

		<guid isPermaLink="false">http://cartesianfaith.wordpress.com/?p=559</guid>
		<description><![CDATA[Production applications typically have a separate environment for disaster recovery and business continuity. Depending on the needs of the application, &#8230;<p><a href="http://cartesianfaith.wordpress.com/2012/07/04/really-simple-replication-in-riak/">Continue reading &#187;</a></p><img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=cartesianfaith.wordpress.com&#038;blog=6379897&#038;post=559&#038;subd=cartesianfaith&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Production applications typically have a separate environment for disaster recovery and business continuity. Depending on the needs of the application, this may be a hot back up or warm standby. Either way you need to have your data replicated at your DR environment.</p>
<p>For riak clusters there is a simple way of doing this. By taking advantage of a post commit hook, you can have every object written to riak pushed to a node in a separate cluster. I&#8217;ve written a simple erlang library called <a href="https://github.com/muxspace/doppelganger">doppelganger</a> that does this. This approach keeps the clusters separate yet in a mirrored state as the diagram below illustrates.</p>
<p><a href="http://cartesianfaith.files.wordpress.com/2012/07/doppelganger.png"><img class="alignnone size-full wp-image-560" title="Doppelganger replication" src="http://cartesianfaith.files.wordpress.com/2012/07/doppelganger.png?w=529&#038;h=252" alt="" width="529" height="252" /></a></p>
<p>To use the library you need to do three things on your primary environment.</p>
<ol>
<li>Add doppelganger to your primary riak environment</li>
<li>Register doppelganger as a default post commit hook</li>
<li>Set the options for doppelganger in your riak app.config</li>
</ol>
<p>Note that the secondary environment needs no configuration as it is merely the target of the replication.</p>
<h4>Adding doppelganger to riak</h4>
<p>In riak&#8217;s app.config, add a term for custom erlang modules. You can specify any path you like, and I typically use something like /etc/riak/erlang.</p>
<pre><code>{add_paths, ["/etc/riak/erlang"]}</code></pre>
<p>Build doppelganger by running make and drop the beams there.</p>
<h4>Register doppelganger as a post commit hook</h4>
<p>If your riak_core section in the app.config does not have a default_bucket_props, then add the below term.</p>
<pre>{default_bucket_props, [
  {postcommit, [
    {struct, [{&lt;&lt;"mod"&gt;&gt;,&lt;&lt;"doppelganger"&gt;&gt;}, {&lt;&lt;"fun"&gt;&gt;,&lt;&lt;"replicate"&gt;&gt;}] }
   ]}
]}</pre>
<h4>Set options for doppelganger</h4>
<p>Doppelganger is meant to be unobtrusive. The only options it supports are: enabling the module and setting the target host and port. This configuration goes into a separate section in the app.config.</p>
<pre>{doppelganger, [
  {active, true},
  {riak_host, "your-doppelganger-host" },
  {riak_port, 8081} % Should be your PB port 
]}</pre>
<p>Once these steps are complete, fire up your secondary environment and then your primary environment. You should see the postcommit hook registered in any buckets you have defined and upon posting data to the primary it will appear in the secondary environment.</p>
<h3>Future Plans</h3>
<p>The next step is to handle network partitions or node failure in the secondary environment to ensure no data is lost. I also need to preserve the riak object meta data to ensure the replication is as close to the original data as possible.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/cartesianfaith.wordpress.com/559/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/cartesianfaith.wordpress.com/559/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=cartesianfaith.wordpress.com&#038;blog=6379897&#038;post=559&#038;subd=cartesianfaith&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://cartesianfaith.wordpress.com/2012/07/04/really-simple-replication-in-riak/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/6b4bc36b8607b256e0ec8e320291802a?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">muxspace</media:title>
		</media:content>

		<media:content url="http://cartesianfaith.files.wordpress.com/2012/07/doppelganger.png" medium="image">
			<media:title type="html">Doppelganger replication</media:title>
		</media:content>
	</item>
	</channel>
</rss>
