AI, startup hacks, and engineering miracles from your friends at Faraday

Open source at Faraday

Here are Faraday's contributions to open source that we use every day in production.

(beta release) 3rd gen batch processing on k8s: falconeri

falconeri is a distributed batch job runner for kubernetes (k8s). It is compatible with Pachyderm pipeline definitions, but is simpler and handles autoscaling, etc. properly.

(beta release) Seamless transfer between Postgres/Citus and BigQuery: dbcrossbar

dbcrossbar handles all the details of transferring tables and data to and from Postgres and Google BigQuery. Additionally, it knows about citus, the leading Postgres horizontal sharding solution - so it can do highly efficient transfers between Citus clusters and BigQuery.

A new standard for secrets: Secretfile

secret_garden (Ruby), vault-env (JS), and credentials-to-env (Rust) all implement a standard we call Secretfile(s):

# /app/Secretfile
DATABASE_URL secrets/database/$VAULT_ENV:url
REDIS_URL secrets/redis/$VAULT_ENV:url

Then you use it like this SecretGarden.fetch('DATABASE_URL').
Clients implementing this standard are meant to first check the environment for DATABASE_URL, then failing that look up the secret in Hashicorp Vault (interpolating $VAULT_ENV into production, staging, etc. first). It's very useful for development where your DATABASE_URL is just postgres://seamus@127.0.0.1:5432/myapp - you can save this in a local .env file and only mess with Vault in production/staging.

Lightning fast CSV processing: catcsv and scrubcsv

catcsv is a very fast CSV concatenation tool that gracefully handles headers and compression. It also supports Google's Snappy compression. We store everything on S3 and GCS szip'ed using burntsushi's szip.

$ cat a.csv
city,state
burlington,vt

$ cat b.csv
city,state
madison,wi

$ szip a.csv

$ ls
a.csv.sz
b.csv

$ catcsv a.csv.sz b.csv
city,state
burlington,vt
madison,wi

Of course, before you cat files, sometimes you need to clean them up with scrubcsv:

$ scrubcsv giant.csv > scrubbed.csv
3000001 rows (1 bad) in 51.58 seconds, 72.23 MiB/sec

Lightning-fast fixed-width to CSV: fixed2csv

fixed2csv converts fixed-width files to CSV very fast. You start with this:

first     last      middle
John      Smith     Q
Sally     Jones

You should be able to run:

$ fixed2csv -v 10 10 6 < input.txt
first,last,middle
John,Smith,Q
Sally,Jones,

World's fastest geocoder: node_smartystreets

node_smartystreets is the world's fastest geocoder client. We shell out to its binary rather than using it as a library. It will do 10k records/second against the smartystreets geocoding API. If you don't have an Unlimited plan, use it with extreme caution.

Better caching: lock_and_cache

lock_and_cache (Ruby) and lock_and_cache_js (JS) go beyond normal caching libraries: they lock the calculation while it's being performed. Most caching libraries don't do locking, meaning that >1 process can be calculating a cached value at the same time. Since you presumably cache things because they cost CPU, database reads, or money, doesn't it make sense to lock while caching?

def expensive_thing
  @expensive_thing ||= LockAndCache.lock_and_cache("expensive_thing/#{id}", expires: 30) do
    # do expensive calculation
  end
end

It uses Redis for distributed caching and locking, so this is not only cross-process but also cross-machine.

Better state machine: status_workflow

status_workflow handles state transitions with distributed locking using Redis. Most state machine libraries either don't do locking or use Postgres advisory locks.

class Document < ActiveRecord::Base
  include StatusWorkflow
  status_workflow(
    archive_requested: [:archiving],
    archiving: [:archived],
  )
end

Then you can do

document.enter_archive_requested!

It's safe to use in a horizontally sharded environment because it uses distributed locking - the second process that tries to do this will get a InvalidTransition error even if it's the same microsecond.

Rust build tools: rust-musl-builder and heroku-buildpack-rust

rust-musl-builder is how we build Rust apps on top of Alpine. It also drives heroku-buildpack-rust, the preeminent way of running Rust on Heroku.

Minimal postgres for node: simple-postgres

simple-postgres (JS) is just the essentials to talk to Postgres from Node. We particularly love its use of template literals for apparently magical escaping:

let account = await db.row`
  SELECT *
  FROM accounts
  WHERE id = ${id}
`

Yes, that's safe!

Minimal HTTP server: srvr

srvr (JS) is a small HTTP server that speaks for itself:

  • everything express does
  • better
  • less code
  • no dependencies
  • websockets

Proper Docker API support for Rust: boondock

boondock is a rewrite of rust-docker to be more correct.

Coordinate docker-compose: cage

cage boots multiple docker-compose.ymls, each as a pod. It's sortof like a local k8s. You configure it with a bunch of docker-compose files:

pods/
├── admin.yml (a pod containing adminweb and horse)
├── common.env (common env vars)
├── donkey.yml (a pod containing donkey)
├── placeholders.yml (development-only pod with redis, db, etc.)
[...]

Local development looks like this:

$ cage pull
==== Fetching secrets from vault into config/secrets.yml
==== Logging into ECR
Fetching temporary AWS 'administrator' credentials from vault
Pulling citus        ... done
Pulling citusworker1 ... done
Pulling citusworker2 ... done
Pulling queue        ... done
Pulling redis        ... done
Pulling s3           ... done
Pulling smtp         ... done
Pulling vault        ... done
Pulling horse        ... done
Pulling adminweb     ... done
[...]
$ cage up
Starting fdy_citusworker2_1 ... done
Starting fdy_smtp_1         ... done
Starting fdy_citus_1        ... done
Starting fdy_vault_1        ... done
Starting fdy_citusworker1_1 ... done
Starting fdy_queue_1        ... done
Starting fdy_s3_1           ... done
Starting fdy_redis_1        ... done
Starting fdy_horse_1 ... done
Starting fdy_adminweb_1 ... done
[...]
$ cage stop
Stopping fdy_citusworker2_1 ... done
Stopping fdy_vault_1        ... done
Stopping fdy_citus_1        ... done
Stopping fdy_s3_1           ... done
[...]

Fixed up rust crates: rust-amqp

rust-amqp@tokio (Rust) is our rewrite of the internals of the rust-amqp crate in proper tokio. It is much more reliable and needs to be merged upstream.

Conclusion

We love open source at Faraday!

How customer-centric marketers use machine learning

You've probably noticed the growing hype around artificial intelligence (AI) in marketing. From chatbots to content creation to programmatic advertising — it seems like every other MarTech or AdTech platform is baking in some sort of AI capability.

With so many applications, it's easy to lose sight of what's most important in implementing an effective, optimized marketing strategy: deeply understanding your customers.

machine_learning_customer_insights_graphic

According to Forbes Insights, only 13% of businesses express a high degree of confidence that they are making the most of their customer data.

Stepping back, what does AI even mean for marketing?

At a high level, AI refers to a computer's replication of some aspect of human intelligence — pretty ambiguous, right? AI, as it exists today, is an umbrella term for a range of computer-enabled data analysis techniques — the most relevant and widely-practiced in marketing being machine learning.

Machine learning (ML) is the process of training computers to “learn” to recognize important patterns and trends in large datasets, with the goal of developing data models that can quickly categorize new data inputs and predict likely outcomes.

So, what does that mean for you, the customer-centric marketer? When using your customer data, or training data, as the basis of machine learning models, you can start to generate deeper customer insights and make better behavioral predictions. These can be around your prospects' and customers' likelihood to convert on certain campaigns, increase their purchase frequency, churn or lapse, or something much more specific.

Leveraging machine learning in your marketing strategy is no longer a luxury — it's a necessity. As competition increases and ad space gets more crowded, consumers have more choices of businesses to engage with, making machine learning critical to efficiently reaching the right people and keeping your customers engaged.

ML-driven insights marketers can't ignore

It should come as no surprise that the world's top brands are efficiently scaling growth by leveraging machine learning to prioritize their resources and personalize experiences across their customer lifecycles.

Here are some of the most important ML-driven insights marketers are using to craft better customer experiences and optimize their performance.

Behavioral insights and predictions

A vital piece of giving your prospects and customers a memorable experience with your brand is knowing who to engage with and when.

Have you ever been in the position as a consumer where you're targeted with ads that don't align with who you are or where you are in your customer journey? Perhaps you see an ad trying to get you to buy an item you've already bought, or offering you a first-purchase discount code when you're already a customer.

As a consumer, these experiences can be annoying and frustrating. And as a marketer, your message can end up diluted instead of impactful.

Behavioral predictions are crucial to proactively engaging your customers. Rather than reacting to your customers' behaviors, you're able to anticipate them and market to your most valuable customers at the right time, with content that corresponds to where they are in their journeys. This helps you prioritize your resources, optimize your marketing spend, and cut through the noise to better reach your target audience.

How can machine learning help predict customer behavior?

machine_learning_decision_tree_icon

Relatively straightforward machine learning algorithms can uncover predictive patterns hidden deep in your customer data. The random forest, or random decision forest model, analyzes your historical customer data, building a series of decisions trees to predict the likelihood that future inputs (e.g. new leads) will result in a target outcome (e.g. make a purchase).

If you advertise on Facebook, you've likely come across lookalike audiences, which are generated in a similar way. Using lead conversion as the example outcome, these models essentially predict the degree to which a new lead best “looks like” leads who've successfully converted to customers in the past.

Naturally, the data used to train or build your behavioral models will influence how they make predictions. Done properly, these predictive insights can have a tremendous impact on your return on ad spend, overall customer acquisition costs, and customer satisfaction.

saatchi_art_lead_gen_case_study

Persona-based insights and predictions

While predicting your customer's next move is immensely helpful in reaching the right people at the right time, it's not where the road to truly optimized marketing ends. Effective and thoughtful engagement requires an understanding of who your customers are as real people, so you can create hyper-personalized experiences that evoke emotional responses.

Salesforce Research revealed that 84% of customers say that being treated like a person is very important to winning their business. If you show your audience that you understand what motivates them to interact with you, whether they're an early prospect or a loyal customer, the relationship and trust between you and your customers grow stronger.

So how can machine learning help you personalize experiences at scale?

Customer clustering

Buyer personas, semi-fictional representations of your target customers, are instrumental in creating personalized content and creative that truly resonates with them. Traditionally, personas were created using basic demographic data, some psychographic data from surveys or focus groups, and a good amount of human intuition. While this approach has worked for years, it leaves too much room for human bias.

Customer clustering leverages a different machine learning technique than behavioral modeling: unsupervised machine learning. Rather than uncovering patterns that are predictive of a known outcome (e.g. likely to convert), unsupervised algorithms, like K-means, sort data into distinct groups based on shared attributes. The resulting groups, or clusters, form the foundation of unbiased, truly data-driven personas.

machine_learning_cluster_icon

As time goes on and new data is collected, running the clustering algorithm again may reveal new emergent personas, enabling you to refresh your messaging, creative, and other personalization efforts to stay relevant as your customer base evolves.

Burrow, a disruptive direct-to-consumer furniture brand, uses ML-driven personas to identify what color couches their audience segments see in targeted ads. They found that customers who are older, live in single-family homes, and have kids are more likely to buy couches in darker colors; customers who are younger, live in apartments, and have few or no kids are more likely to buy couches in lighter colors. With these insights, Burrow was able to push creative that reflected these attributes to the audiences that possessed them.

burrow_omnichannel_case_study

Location-based insights and predictions

Virtually all consumer-facing companies need to consider location in their marketing strategies, whether it's out-of-home advertising, brick-and-mortar marketing, or geotargeted search campaigns. However, these tactics are often expensive and difficult to do well.

geospatial_intelligence_icon

Geospatial intelligence refers to a suite of geospatial analysis techniques enhanced with a combination of behavioral and persona-based insights. Whether you're considering increasing investment in your existing markets, expanding to new ones, or looking to drive foot traffic to specific retail or branch locations, incorporating predictive insights into your geospatial analyses can drastically improve ROI.

  • Predictive penetration analysis aims to identify the maximum return you can expect from further investments in your existing markets. The resulting insights can help you understand the performance of your existing sites and determine whether you should increase investments or cut back entirely.
  • Predictive market analysis identifies specific hotspots of existing customers and customer lookalikes to guide where you should put new sites and optimize acquisition-focused out-of-home and geo-targeted SEM campaigns.
  • Predictive trade area analysis identifies the maximum distance customers are likely to travel to your existing retail sites. The resulting insights can help you optimize individually-targeted and geo-targeted campaigns focused on driving foot traffic to those sites.

Implementing machine learning

While several complex applications of AI are still years — or even decades — away from being fully developed, the democratization of machine learning is enabling nimble marketing teams to generate these predictive customer insights without having to spend millions on expensive consultants or hire large data science teams.

If you're considering adopting machine learning or looking for ways to expand your analytics team's bandwidth, our article, Is your marketing team AI-ready?, discusses important considerations to ensure successful implementation.

how_to_build_ai_marketing_stack_guide


PRESS RELEASE - State of Vermont chooses Faraday to optimize targeted marketing campaigns

Local Artificial Intelligence Company To Work With State Of Vermont To Help Grow Workforce

BURLINGTON, Vt., Feb. 13, 2019 /PRNewswire/ -- Faraday is pleased to announce a strategic contract with the State of Vermont to use their platform to drive more interest and engagement in people looking to relocate to Vermont. This technology will be used in conjunction with the Vermont Department of Economic Development's ThinkVermont initiative and website, which engage a wide-ranging audience around opportunities to live and work in Vermont.

Read more here



The DTC movement, AI, and snowshoes

Last week the Faraday sales team took some time off from the usual day-to-day grind to reflect on the last year, talk about emerging trends in the consumer landscape, and get some much-needed exercise in the Green Mountains of Vermont—I'll admit they're a little greener in the summer.

faraday-sales-retreat-2019

We briefly talked goals and tactics (would it really be a sales retreat without some numbers?), but the bigger discussions revolved around Faraday's why — specifically, why data-driven companies will eventually dominate every consumer market. I'm not going to pitch you here, but I will share my biggest takeaways from the day...

  1. We've all noticed the direct-to-consumer (DTC) movement that's rendering traditional marketing channels obsolete. Bypassing distribution channels is better for the bottom line and gives brand-manufacturers greater control over end-customer data—that is to say, meaningful data—which smart brands are using to optimize everything about their business. Here's the big takeaway: the DTC movement is expanding beyond retail and goods to literally every single consumer market (i.e., consumer finance, real estate, transportation, home improvement—the list goes on). My fellow colleague, Riley, just published a great article about how the DTC movement is transforming financial services. Definitely worth checking it out for a deeper dive into this DTC shift.
  2. To echo Riley, companies who embrace this movement will thrive, while others slowly fade away. Intelligent use of data will distinguish the thrivers and faders, making it clear that AI/ML adoption is no longer a luxury—it's an absolute necessity.
  3. Here's my third and final takeaway: if your team ever plans a snowshoe expedition (which is probably just a Vermont thing) make sure to wear snowshoes that actually fit, or you'll end up tripping yourself constantly, as I did (Dad's gigantic snowshoes).

Regardless of your company's industry, we can all agree on one emergent truth: the future of consumer marketing requires AI/ML to stay relevant.




PRESS RELEASE - NYSERDA chooses Faraday AI Platform to reduce customer acquisition costs

NYSERDA Chooses Faraday AI Platform To Reduce Customer Acquisition Costs For Clean Energy Technologies

BURLINGTON, Vt., Oct. 10, 2018 /PRNewswire/ -- Faraday, Inc. has been chosen to provide its multichannel data-driven customer targeting tools and complementary consulting services to contractors participating in New York State Energy Research and Development Authority (NYSERDA) programs base on their unique artificial intelligence (AI) platform and experience in clean energy.

Read more here