论文

论文学习。主要是分布式系统，数据库，存储的论文。会优先看工作相关系统(如kafka，es，raft，dymano等)，然后看其他部分。分布式系统庞大且复杂，经典的理论到具体的产品实现之间的鸿沟以及取舍虽然可以用模型来说通，但是细节是难以推敲的，甚至很难验证，只能靠感觉，因为理论不够扎实以及实践不够深入，所以这里论文是拼凑分布式系统的一个蓝图，从而形成自己的理解。

预计阅读

ID	名字	作者	进度
1	Highly Available Transactions: Virtues and Limitations		Done
2	A Critique of the CAP Theorem		Done
3	Consistency Tradeoffs in Modern Distributed Database System Design		Done
4	Consensus in the Cloud: Paxos Systems Demystified		Done
5	Understanding Replication in Databases and Distributed Systems		Done
6	Eventually Consistent：Building reliable distributed systems at a worldwide scale demands trade-offs between consistency and availability.	Werner Vogels	Done
7	Life beyond Distributed Transactions: an Apostate’s Opinion	Pat Helland	ToDo
8	ZooKeeper: Wait-free coordination for Internet-scale systems
9	Zab: High-performance broadcast for primary-backup systems
10	Raft：In Search of an Understandable Consensus Algorithm (Extended Version)
11	Distributed systems for fun and profit
12	Kafka: a Distributed Messaging System for Log Processing
13	Designing Access Methods: The RUM Conjecture
14	Consistency in Non-Transactional Distributed Storage Systems
15	Consistency models in distributed systems: A survey on definitions, disciplines, challenges and applications
16	Facebook：Cassandra - A Decentralized Structured Storage System
17	Google:The Chubby lock service for loosely-coupled distributed systems
18	Google：Bigtable: A Distributed Storage System for Structured Data
19	Amazon：Dynamo: Amazon's Highly Available Key-value Store
20	Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases
21	Probabilistically Bounded Staleness for Practical Partial Quorums
22	A Survey on the Evolution of Stream Processing Systems
23	The Case for Shared Nothing

论文收集

ID	名字	作者
1	The Anatomy of a Large-Scale Hypertextual Web Sear
2	Web Search for a Planet: The Google Cluster Architecture
3	The Google File System
4	MapReduce: Simplified Data Processing on Large Clusters
5	Bigtable: A Distributed Storage System for Structured Data
6	The Chubby lock service for loosely-coupled distributed systems
7	Interpreting the Data: Parallel Analysis with Sawzall
8	Pregel: a system for large-scale graph processing
9	Dremel: Interactive Analysis of Web-Scale Datasets
10	Percolator: Large-scale Incremental Processing Using Distributed Transactions and Notifications
11	MegaStore: Providing Scalable, Highly Available Storage for Interactive Services
12	Case Study GFS: Evolution on Fast-forward
13	Google File System II: Dawn of the Multiplying Master Nodes
14	Tenzing – A SQL Implementation on the MapReduce Framework
15	F1-The Fault-Tolerant Distributed RDBMS Supporting Google’s Ad Business
16	Elmo: Building a Globally Distributed, Highly Available Database
17	PowerDrill：Processing a Trillion Cells per Mouse Click
18	Google-Wide Profiling:A Continuous Profiling Infrastructure for Data Centers
19	Spanner: Google’s Globally-Distributed Database
20	Dapper, a Large-Scale Distributed Systems Tracing Infrastructure
21	Omega: flexible, scalable schedulers for large compute clusters
22	CPI2: CPU performance isolation for shared compute clusters
23	Photon: Fault-tolerant and Scalable Joining of Continuous Data Streams
24	F1: A Distributed SQL Database That Scales
25	MillWheel: Fault-Tolerant Stream Processing at Internet Scale
26	B4: Experience with a Globally-Deployed Software Defined WAN
27	The Datacenter as a Computer
28	Google brain-Building High-level Features Using Large Scale Unsupervised Learning
29	Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing
30	Shasta: Interactive Reporting at Scale
31	Goods: Organizing Google’s Datasets
32	FlumeJava: Easy, Efficient Data-Parallel Pipelines
33	Large-scale cluster management at Google with Borg
34	Borg: The Predecessor to Kubernetes
35	Spanner: Becoming a SQL System
36	The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing
37	Appraising Two Decades of Distributed Computing Theory Research
38	A brief history of Consensus 2PC and Transaction Commit
39	The Byzantine Generals Problem
40	Impossibility of distributed consensus with one faulty process
41	Leases: An Efficient Fault-Tolerant Mechanism for Distributed File Cache Consistency
42	Time Clocks and the Ordering of Events in a Distributed System
43	The Part Time Parliament
44	How to Build a Highly Available System Using Consensus
45	Paxos Made Simple
46	Paxos Made Live
47	Consensus on Transaction Commit
48	Why Do Computers Stop and What Can Be Done About It
49	On Designing and Deploying Internet-Scale Services
50	Single-Message Communication
51	How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs
52	Distributed Snapshots: Determining Global States of a Distributed System
53	Self-stabilizing systems in spite of distributed control
54	Wait-Free Synchronization
55	Solution of a Problem in Concurrent Programming Control
56	A New Solution of Dijkstra’s Concurrent Programming Problem
57	Life beyond Distributed Transactions:an Apostate’s Opinion
58	Hints for Computer System Design
59	Virtual Time and Global States of Distributed Systems
60	Timestamps in Message-Passing Systems That Preserve the Partial Ordering
61	Fundamentals of Distributed Computing:A Practical Tour of Vector Clock Systems
62	Knowledge and Common Knowledge in a Distributed Environment
63	Understanding Failures in Petascale Computers
64	Why Do Internet services fail, and What Can Be Done About It?
65	End-To-End Arguments in System Design
66	Rethinking the Design of the Internet: The End-to-End Arguments vs. the Brave New World
67	The Design Philosophy of the DARPA Internet Protocols
68	Uniform consensus is harder than consensus
69	Paxos made code – Implementing a high throughput Atomic Broadcast
70	RAFT:In Search of an Understandable Consensus Algorithm
71	Problems, Unsolved Problems and Problems in Concurrency
72	Implementing fault-tolerant services using the state machine approach
73	White Paper Introduction to IEEE 1588 & Transparent Clocks
74	Unreliable Failure Detectors for Reliable Distributed Systems
75	A Relational Model of Data for Large Shared Data Banks
76	SEQUEL：A Structured English Query Language
77	Implentation of a Structured English Query Language
78	A System R: Relational Approach to Database Management
79	Granularity of Locks and Degrees of Consistency in a Shared DataBase
80	Access Path Selection in a RDBMS
81	Notes on Data Base Operating Systems
82	The Transaction Concept:Virtues and Limitations
83	NONBLOCKING COMMIT PROTOCOLS
84	MVCC：Multiversion Concurrency Control-Theory and Algorithms
85	ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging
86	A Comparison of the Byzantine Agreement Problem and the Transaction Commit Problem
87	A Formal Model of Crash Recovery in a Distributed System
88	What Goes Around Comes Around
89	Anatomy of a Database System
90	Architecture of a Database System
91	Towards Robust Distributed Systems
92	Harvest, Yield, and Scalable Tolerant Systems
93	BASE an Acid Alternative
94	MapReduce: A major step backwards
95	The Log: What every software engineer should know about real-time data’s unifying abstraction
96	Dynamo: Amazon’s Highly Available Key-value Store
97	Cassandra – A Decentralized Structured Storage System
98	PNUTS: Yahoo!’s Hosted Data Serving Platform
99	Designs, Lessons and Advice from Building Large Distributed Systems
100	The Tail at Scale
101	How To Design A Good API and Why it Matters
102	The ganglia distributed monitoring system:design, implementation, and experience
103	Chukwa: A large-scale monitoring system
104	Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services
105	Practical Byzantine Fault Tolerance
106	PacificA: Replication in Log-Based Distributed Storage Systems 2008
107	Dominant Resource Fairness: Fair Allocation of Multiple Resource Types
108	Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing
109	Consistent Hashing with Bounded Loads Google 2017
110	The φ Accrual Failure Detector
111	CAP Twelve Years Later: How the "Rules" Have Changed
112	A simple totally ordered broadcast protocol 2008
113	Virtual Time and Global States of Distributed Systems 2002
114	Paxos Made Practical
115	Using Paxos to Build a Scalable, Consistent, and Highly Available Datastore 2011
116	Consensus in the Presence of Partial Synchrony 1988
117	Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications 2003
118	Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems 2001
119	Kademlia: A Peer-to-Peer Information System Based on the XOR Metric 2002
120	A Scalable Content-Addressable Network 2001
121	Ceph: A Scalable, High-Performance Distributed File System 2006 OSDI Sage Weil
122	The Log-Structured Merge-Tree (LSM-Tree) 1996
123	HBase: A NoSQL Database 2017 Hiren Patel
124	Tango: Distributed Data Structures over a Shared Log 2013
125	Finding a needle in Haystack: Facebook's photo storage 2010
126	Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency 2011
127	Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing 2012
128	Scaling Distributed Machine Learning with the Parameter Server 2014
129	S4: Distributed Stream Computing Platform 2010
130	Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases 2017
131	Amazon Aurora: On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes 2018
132	Chain Replication for Supporting High Throughput and Availability
133	No compromises: distributed transactions with consistency, availability, and performance
134	Don’t Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS
135	Secure Untrusted Data Repository (SUNDR)
136	The Case for Shared Nothing	Michael Stonebraker
137	the red book
138	SAGAS	Hector Garcaa-Molrna
139	ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging
140	Big Data：A survey

论文来源

分布式系统领域经典论文翻译集：http://duanple.com/?p=170
DDIA：https://github.com/ept/ddia-references
MIT：http://dsrg.pdos.csail.mit.edu/papers/
MIT6.824：https://pdos.csail.mit.edu/6.824/schedule.html
分布式数据库系统原理：https://cs.uwaterloo.ca/~ddbook/downloads/appendix/References.pdf
Kafka论文：https://kafka.apache.org/books-and-papers
CMU15721：https://15721.courses.cs.cmu.edu/spring2020/schedule.html
Distributed Systems Reading List：https://dancres.github.io/Pages/
Lamport：https://lamport.azurewebsites.net/pubs/pubs.html
共识：https://github.com/dgryski/awesome-consensus
Awesome Distributed Systems：https://github.com/theanalyst/awesome-distributed-systems
Readings in distributed systems：http://christophermeiklejohn.com/distributed/systems/2013/07/12/readings-in-distributed-systems.html
Foundational distributed systems papershttp://muratbuffalo.blogspot.com/2021/02/foundational-distributed-systems-papers.html
Distributed Systems Readings：https://henryr.github.io/distributed-systems-readings/
Developer And Architect：https://github.com/xiaozhiliaoo/my-collect
100 open source Big Data architecture papers for data professionals：https://github.com/xiaozhiliaoo/bigdata/blob/main/paper/100paper-paypal-anil-madan.md
世界名校的课程收集来的论文：https://xiaozhiliaoo.github.io/distributed-system-practice/course/
各种《分布式系统泛型与原理》《分布式系统概念与技术》书
https://github.com/rxin/db-readings
CMU_15445，CMU_15721 课程。

读书笔记

论文

预计阅读

论文收集

论文来源