Author Archives: lipeng

Rewrite lock, copy-on-write

Rewrite lock. Lock a data, then write it, then release the lock.

copy-on-write, copy the data into a new place, update the date in new place. Then update the data’s reference to the new place.

Write ahead log(WAL)

WAL persistents operation to disk, then write to cache. If each operation needs to persistent to disk, then it is low efficient. Instead, do batching, this helps to improve performance, also reduce the error to batch level.

Flushing every log write to the disk gives a strong durability guarantee (which is the main purpose of having logs in the first place), but this severely limits performance and can quickly become a bottleneck. If flushing is delayed or done asynchronously, it improves performance but there is a risk of losing entries from the log if the server crashes before entries are flushed. Most implementations use techniques like Batching, to limit the impact of the flush operation.

mysql on mac

1. restart
/usr/local/bin/mysql.server restart

2. show mysql variables
mysqladmin variables

Two ways to read aws access key/secret. Different CredentialProvider

  1. instance can have instance role. When an application run in instance, it can InstanceProfileCredentialsProvider() to retrieve instance role and have the access. For example, instance_profile from EMR cluster is the role for EMR instance.
  2. StsAssumeRoleSessionCredentialsProvider
  3. AWSStaticCredentialsProvider
  4. AWSCredentialsProviderChain. It will test different credential one by one, until it finds one.

Here is a code example

Category: aws

Find count of sum pairs in array, which the sum is greater than target.

Given an array [4 2 1 3 5], and a target. Return the number of pairs that the sum is greater than 4. Allow duplicates. [2, 3], [3, 2] are different. [3, 3] is ok.

Technique: 1. sort, 2. use two pointers.

[1 2 3 4 5]

[1, 4] -> [1, 4], [1, 5]
[2, 3] -> [2, 3], [2, 4], [2, 5]
[3, 2] -> [3, 2], [3, 3], [3, 4], [3, 5]
[4, 1] -> [4, 1], [4, 2], [4, 3], [4, 4], [4, 5]
[5, 1] -> [5, 1], [5, 2], [5, 3], [5, 4], [5, 5]

2. Doesn’t allow duplicate

[1 2 3 4 5]

[a, b]
[1, 4] -> [1, 4], [1, 5]
[2, 3] -> [2, 3], [2, 4], [2, 5]
[3, 2] -> [3, 2], [3, 3], [3, 4], [3, 5]
[4, 1] -> [4, 1], [4, 2], [4, 3], [4, 4], [4, 5]
[5, 1] -> [5, 1], [5, 2], [5, 3], [5, 4], [5, 5]

for each loop, the ans is len – Math.max(a, b). Or in another word, we only pick the one with a < b to avoid duplicate.



Policies setting: aws account is able to upload to s3

Assume Role Way

  1. create a aws user. This user doesn’t has any policy.


2. After created user account, it should tell you the ACCESS_ID and ACCESS_SECRET, copy that to somewhere.

3. create iam role. This role needs a policy, which has access to s3 bucket


4. This role should also have trust relationship with user account we’ve just created.


5. In local, run below command.

AWS_ACCESS_KEY_ID=xxxx AWS_SECRET_ACCESS_KEY=xxxx aws sts assume-role --role-arn ${assume_role_arn} --role-session-name "RoleSession1"

Then it will output assume role key/secret/session_token. In order to achieve this, just add assuming this account in this role.

6. Copy thekey/secret/session_token and run below command, it executes s3 operations.



User Way

We can create a user, the user directly has policy to access S3 bucket. ‘


Then we can directly run below command to access to S3 bucket by the user credential, instead of assumeRole. But this way is not recommended way.


Category: aws

What is bad code, OO design from Uncle Bob

rigidity code, modules are coupled. Change in module1, requires change module2, then requires change in module3.

fragile code, change in module1, but it caused issue in other module or system, which is very unrelated to the module1. Bazzare break, weird break. Like your car window can’t be opened, mechanics fixed the window, but the car engine won’t start.

dependencies. I want to use someone’s code. The code does solve the desired problem. But the code also brings other problem, werid data structure, databases. etc.

OO: encapsulation, inheritance, polymorphism

OO weakens encapsulation. Because variable has public/private/protected/default. Original language such as C, just call the function. It was a good encapsulation.

Bastion host configuration and private key in ~/.ssh folder,

We need to ssh to bastion host, from there, ssh to xxx.ec2.internal host. The configuration in ~/.ssh/config file is like below:

Host *.ec2.internal     // it applies to every *.ec2.internal
  User hadoop     // the default username for final host. hadoop@xxx.ec2.internal,
  IdentityFile ~/.ssh/ssh-private.key    // the private ssh key
  UseKeychain yes
  ProxyCommand ssh -W %h:%p.     // username, bastion host

So, later we can just simply run “ssh abc.ec2.internal“, it will ssh to it by using the bastion host.

Only putting the private key there, such as:

Host *
  IdentityFile ~/.ssh/ssh-private.key
  UseKeychain yes

One line command is like:

ssh -o ProxyCommand='ssh -W %h:%p {bastion-user-name}@{bastion-host-name}' username@{target-host-ip}
Category: aws

IAM assume policy, IAM policy

1. Create an iam role. During creating the iam role, define IAM assume role policy. Assume policy tells who can assume this iam role.
2. Define IAM Policy. Attach IAM policy to this role.



In aws UI, the assume policy will be shown as Trust relationships tab, The normal IAM policy is shown as in Permissions tab.

Below is an example how to use terraform to create role with IAM assume policy, and IAM role policy:

resource "aws_iam_policy" "server_policy" {
  name        = "server_policy"
  path        = "/"
  description = "TBD"

  policy = <<EOF
  "Version": "2012-10-17",
  "Statement": [
      "Action": [
      "Resource": [
      "Effect": "Allow",
      "Sid": ""

resource "aws_iam_role_policy_attachment" "server_policy" {
  role       = "${}"
  policy_arn = "${aws_iam_policy.server_policy.arn}"
Category: aws